Utilizing optical character recognition_Learning Microsoft Cognitive Services-QQ阅读男频武侠网

上QQ阅读APP看书，第一时间看更新

Generic image analysis

We start enabling generic image analysis by adding a UI to the ImageAnalysis.xaml file. All the Computer Vision example UIs will be built in the same manner.

The UI should have two columns, as shown in the following code:

    <Grid.ColumnDefinitions>
        <ColumnDefinition Width="*" />
        <ColumnDefinition Width="*" />
    </Grid.ColumnDefinitions>

The first one will contain the image selection, while the second one will display our results.

In the left-hand column, we create a vertically oriented StackPanel label. To this, we add a label and a ListBox label. The list box will display a list of visual features that we can add to our analysis query. Note how we have a SelectionChanged event hooked up in the ListBox label in the following code. This will be added behind the code, and will be covered shortly:

    <StackPanel Orientation="Vertical"Grid.Column="0">

    <TextBlock Text="Visual Features:"
               FontWeight="Bold"
               FontSize="15"
               Margin="5, 5" Height="20" />

    <ListBox: Name = "VisualFeatures"
          ItemsSource = "{Binding ImageAnalysisVm.Features}"
          SelectionMode = "Multiple" Height="150" Margin="5, 0, 5, 0"
          SelectionChanged = "VisualFeatures_SelectionChanged" />

The list box will be able to select multiple items, and the items will be gathered in the ViewModel.

In the same stack panel, we also add a button element and an image element. These will allow us to browse for an image, show it, and analyze it. Both the Button command and the image source are bound to the corresponding properties in the ViewModel, as shown in the following code:

    <Button Content = "Browse and analyze"
            Command = "{Binding ImageAnalysisVm.BrowseAndAnalyzeImageCommand}"
            Margin="5, 10, 5, 10" Height="20" Width="120"
            HorizontalAlignment="Right" />
       
    <Image Stretch = "Uniform"
           Source="{Binding ImageAnalysisVm.ImageSource}"
           Height="280" Width="395" />
    </StackPanel>

We also add another vertically oriented stack panel. This will be placed in the right-hand column. It contains a title label, as well as a textbox, bound to the analysis result in our ViewModel, as shown in the following code:

    <StackPanel Orientation= "Vertical"Grid.Column="1">
        <TextBlock Text="Analysis Results:"
                   FontWeight = "Bold"
                   FontSize="15" Margin="5, 5" Height="20" />
        <TextBox Text = "{Binding ImageAnalysisVm.AnalysisResult}"
                 Margin="5, 0, 5, 5" Height="485" />
    </StackPanel>

Next, we want to add our SelectionChanged event handler to our code-behind. Open the ImageAnalysisView.xaml.cs file and add the following:

    private void VisualFeatures_SelectionChanged(object sender, SelectionChangedEventArgs e) {
        var vm = (MainViewModel) DataContext;
        vm.ImageAnalysisVm.SelectedFeatures.Clear();

The first line of the function will give us the current DataContext, which is the MainViewModel class. We access the ImageAnalysisVm property, which is our ViewModel, and clear the selected visual features list.

From there, we loop through the selected items from our list box. All items will be added to the SelectedFeatures list in our ViewModel:

        foreach(VisualFeature feature in VisualFeatures.SelectedItems)
        {
            vm.ImageAnalysisVm.SelectedFeatures.Add(feature);
        }
    }

Open the ImageAnalysisViewModel.cs file. Make sure that the class inherits the ObservableObject class.

Declare a private variable, as follows:

    private IVisionServiceClient _visionClient;

This will be used to access the Computer Vision API, and it is initialized through the constructor.

Next, we declare a private variable and the corresponding property for our list of visual features, as follows:

    private List<VisualFeature> _features=new List<VisualFeature>();
    public List<VisualFeature> Features {
        get { return _features; }
        set {
            _features = value;
            RaisePropertyChangedEvent("Features");
        }
    }

In a similar manner, create a BitmapImage variable and property called ImageSource. Create a list of VisualFeature types called SelectedFeatures and a string called AnalysisResult.

We also need to declare the property for our button, as follows:

    public ICommandBrowseAndAnalyzeImageCommand {get; private set;}

With that in place, we create our constructor, as follows:

    public ImageAnalysisViewModel(IVisionServiceClientvisionClient) {
        _visionClient = visionClient;
        Initialize();
    }

The constructor takes one parameter, the IVisionServiceClient object, which we have created in our MainViewModel file. It assigns that parameter to the variable that we created earlier. Then we call an Initialize function, as follows:

    private void Initialize() {
        Features = Enum.GetValues(typeof(VisualFeature))
                       .Cast<VisualFeature>().ToList();

        BrowseAndAnalyzeImageCommand = new DelegateCommand(BrowseAndAnalyze);
    }

In the Initialize function, we fetch all the values from the VisualFeature variable of the enum type. These values are added to the features list, which is displayed in the UI. We also created our button, and now that we have done so, we need to create the corresponding action, as follows:

    private async void BrowseAndAnalyze(object obj)
    {
        var openDialog = new Microsoft.Win32.OpenFileDialog();

        openDialog.Filter = "JPEG Image(*.jpg)|*.jpg";
        bool? result = openDialog.ShowDialog();

        if (!(bool)result) return;

        string filePath = openDialog.FileName;

        Uri fileUri = new Uri(filePath);
        BitmapImage image = new BitmapImage(fileUri);

        image.CacheOption = BitmapCacheOption.None;
        image.UriSource = fileUri;

        ImageSource = image;

The first lines of the preceding code are similar to what we did in Chapter 1, Getting Started with Microsoft Cognitive Services. We open a file browser and get the selected image.

With an image selected, we run an analyze on it, as follows:

    try {
        using (StreamfileStream = File.OpenRead(filePath)) {
            AnalysisResult analysisResult = await  _visionClient.AnalyzeImageAsync(fileStream, SelectedFeatures);

We call the AnalyzeImageAsync function of our _visionClient. This function has four overloads, all of which are quite similar. In our case, we pass on the image as a Stream type and the SelectedFeatures list, containing the VisualFeatures variable to analyze.

The request parameters are as follows:

The response to this request is the AnalysisResult string.

We then check to see if the result is null. If it is not, we call a function to parse it and assign the result to our AnalysisResult string, as follows:

    if (analysisResult != null)
        AnalysisResult = PrintAnalysisResult(analysisResult);

Remember to close the try clause and finish the method with the corresponding catch clause.

The AnalysisResult string contains data according to the visual features requested in the API call.

Data in the AnalysisResult variable is described in the following table:

To retrieve data, for example for categories, you can use the following:

    if (analysisResult.Description != null) {
        result.AppendFormat("Description: {0}\n", analysisResult.Description.Captions[0].Text);
        result.AppendFormat("Probability: {0}\n\n", analysisResult.Description.Captions[0].Confidence);
    }

A successful call would present us with the following result:

Sometimes, you may only be interested in the image description. In such cases, it is wasteful to ask for the kind of full analysis that we have just done. By calling the following function, you will get an array of descriptions:

    AnalysisResultdescriptionResult = await _visionClient.DescribeAsync(ImageUrl, NumberOfDescriptions);

In this call, we have specified a URL for the image and the number of descriptions to return. The first parameter must always be included, but it may be an image upload instead of a URL. The second parameter is optional, and in cases where it is not provided, it defaults to one.

A successful query will result in an AnalysisResult object, which is the same as the one that was described in the preceding code. In this case, it will only contain the request ID, image metadata, and an array of captions. Each caption contains an image description and the confidence of that description being correct.

We will add this form of image analysis to our smart-house application in a later chapter.

Recognizing celebrities using domain models

One of the features of the Computer Vision API is the ability to recognize domain-specific content. At the time of writing, the API only supports celebrity recognition, where it is able to recognize around 200,000 celebrities.

For this example, we choose to use an image from the internet. The UI will then need a textbox to input the URL. It will need a button to load the image and perform the domain analysis. There should be an image element to see the image and a textbox to output the result.

The corresponding ViewModel should have two string properties for the URL and the analysis result. It should have a BitmapImage property for the image and an ICommand property for our button.

Add a private variable for the IVisionServiceClient type at the start of the ViewModel, as follows:

    private IVisionServiceClient _visionClient;

This should be assigned in the constructor, which will take a parameter of the IVisionServiceClient type.

As we need a URL to fetch an image from the internet, we need to initialize the Icommand property with both an action and a predicate. The latter checks whether the URL property is set or not, as shown in the following code:

    public CelebrityViewModel(IVisionServiceClient visionClient) {
        _visionClient = visionClient;
        LoadAndFindCelebrityCommand = new DelegateCommand(LoadAndFindCelebrity, CanFindCelebrity);
    }

The LoadAndFindCelebrity load creates a Uri with the given URL. Using this, it creates a BitmapImage and assigns this to ImageSource, the BitmapImage property, as shown in the following code. The image should be visible in the UI:

    private async void LoadAndFindCelebrity(object obj) {
        UrifileUri = new Uri(ImageUrl);
        BitmapImage image = new BitmapImage(fileUri);

        image.CacheOption = BitmapCacheOption.None;
        image.UriSource = fileUri;

        ImageSource = image;

We call the AnalyzeImageInDomainAsync type with the given URL, as shown in the following code. The first parameter we pass in is the image URL. Alternatively, this could have been an image that was opened as a Stream type:

    try {
        AnalysisInDomainResultcelebrityResult = await _visionClient.AnalyzeImageInDomainAsync(ImageUrl, "celebrities");

        if (celebrityResult != null)
            Celebrity = celebrityResult.Result.ToString();
    }

The second parameter is the domain model name, which is in a string format. As an alternative, we could have used a specific Model object, which can be retrieved by calling the following:

    VisionClient.ListModelsAsync();

This would return an array of Models, which we can display and select from. As there is only one available at this time, there is no point in doing so.

The result from AnalyzeImageInDomainAsync is an object of the AnalysisInDomainResult type. This object will contain the request ID, metadata of the image, and the result, containing an array of celebrities. In our case, we simply output the entire result array. Each item in this array will contain the name of the celebrity, the confidence of a match, and the face rectangle in the image. Do try it in the example code provided.

Utilizing optical character recognition

For some tasks, optical character recognition (OCR) can be very useful. Say that you took a photo of a receipt. Using OCR, you can read the amount from the photo itself and have it automatically added to accounting.

OCR will detect text in images and extract machine-readable characters. It will automatically detect language. Optionally, the API will detect image orientation and correct it before reading the text.

To specify a language, you need to use the BCP-47 language code. At the time of writing, the following languages are supported: simplified Chinese, traditional Chinese, Czech, Danish, Dutch, English, Finnish, French, German, Greek, Hungarian, Italian, Japanese, Korean, Norwegian, Polish, Portuguese, Russian, Spanish, Swedish, Turkish, Arabic, Romanian, Cyrillic Serbian, Latin Serbian, and Slovak.

In the code example, the UI will have an image element. It will also have a button to load the image and detect text. The result will be printed to a textbox element.

The ViewModel will need a string property for the result, a BitmapImage property for the image, and an ICommand property for the button.

Add a private variable to the ViewModel for the Computer Vision API, as follows:

    private IVisionServiceClient _visionClient;

The constructor should have one parameter of the IVisionServiceClient type, which should be assigned to the preceding variable.

Create a function as a command for our button. Call it BrowseAndAnalyze and have it accept object as the parameter. Then, open a file browser and find an image to analyze. With the image selected, we run the OCR analysis, as follows:

    using (StreamfileStream = File.OpenRead(filePath)) {
        OcrResultsanalysisResult = await _visionClient.RecognizeTextAsync (fileStream);

        if(analysisResult != null)
            OcrResult = PrintOcrResult(analysisResult);
    }

With the image opened as a Stream type, we call the RecognizeTextAsync method. In this case, we pass on the image as a Stream type, but we could just as easily have passed on a URL to an image.

Two more parameters may be specified in this call. First, you can specify the language of the text. The default is unknown, which means that the API will try to detect the language automatically. Second, you can specify whether or not the API should detect the orientation of the image. The default is set to false.

If the call succeeds, it will return data in the form of an OcrResults object. We send this result to a function, the PrintOcrResult function, where we will parse it and print the text, as follows:

    private string PrintOcrResult(OcrResultsocrResult)
    {
        StringBuilder result = new StringBuilder();

        result.AppendFormat("Language is {0}\n", ocrResult.Language);
        result.Append("The words are:\n\n");

First, we create a StringBuilder object, which will hold all the text. The first content we add to it is the language of the text in the image, as follows:

        foreach(var region in ocrResult.Regions) { 
            foreach(var line in region.Lines) { 
                foreach(var text in line.Words) { 
                    result.AppendFormat("{0} ", text.Text);
                }
                result.Append("\n");
            }
            result.Append("\n\n");
        }

The result has an array, which contains the Regions property. Each item represents recognized text, and each region contains multiple lines. The line variables are arrays, where each item represents recognized text. Each line contains an array of the Words property. Each item in this array represents a recognized word.

With all the words appended to the StringBuilder function, we return it as a string. This will then be printed in the UI, as shown in the following screenshot:

The result also contains the orientation and angle of the text. Combining this with the bounding box, also included, you can mark each word in the original image.

Generating image thumbnails

In today's world, we, as developers, have to consider different screen sizes when displaying images. The Computer Vision API offers some help with this by providing the ability to generate thumbnails.

Thumbnail generation, in itself, is not that big a deal. What makes the API clever is that it analyzes the image and determines the region of interest.

It will also generate smart cropping coordinates. This means that if the specified aspect ratio differs from the original, it will crop the image, with a focus on the interesting regions.

In the example code, the UI consists of two image elements and one button. The first image is the image in its original size. The second is for the generated thumbnail, which we specify to be 250 x 250 pixels in size.

The View model will need the corresponding properties, two BitmapImages methods to act as image sources, and one ICommand property for our button command.

Define a private variable in the ViewModel, as follows:

    private IVisionServiceClient _visionClient;

This will be our API access point. The constructor should accept an IVisionServiceClient object, which should be assigned to the preceding variable.

For the ICommand property, we create a function, BrowseAndAnalyze, accepting an object parameter. We do not need to check whether we can execute the command. We will browse for an image each time.

In the BrowseAndAnalyze function, we open a file dialog and select an image. When we have the image file path, we can generate our thumbnail, as follows:

    using (StreamfileStream = File.OpenRead(filePath))
    {
        byte[] thumbnailResult = await _visionClient.GetThumbnailAsync(fileStream, 250, 250);

        if(thumbnailResult != null &&thumbnailResult.Length != 0)
            CreateThumbnail(thumbnailResult);
    }

We open the image file so that we have a Stream type. This stream is the first parameter in our call to the GetThumbnailAsync method. The next two parameters indicate the width and height that we want for our thumbnail.

By default, the API call will use smart cropping, so we do not have to specify it. If we have a case where we do not want smart cropping, we could add a bool variable as the fourth parameter.

If the call succeeds, we get a byte array back. This is the image data. If it contains data, we pass it on to a new function, CreateThumbnail, to create a BitmapImage object from it, as follows:

    private void CreateThumbnail(byte[] thumbnailResult)
    {
        try {
            MemoryStreamms = new MemoryStream(thumbnailResult);
            ms.Seek(0, SeekOrigin.Begin);

To create an image from a byte array, we create a MemoryStream object from it. We make sure that we start at the beginning of the array.

Next, we create a BitmapImage object and begin to initialize it. We specify the CacheOption and set the StreamSource to the MemoryStream variables we created earlier. Finally, we stop the BitmapImage initialization and assign the image to our Thumbnail property, as shown in the following code:

        BitmapImage image = new BitmapImage();
        image.BeginInit();
        image.CacheOption = BitmapCacheOption.None;
        image.StreamSource = ms;
        image.EndInit();

        Thumbnail = image;

Close up the try clause and add the corresponding catch clause. You should now be able to generate thumbnails.