Generic image analysis
We start enabling generic image analysis by adding a UI to the ImageAnalysis.xaml
file. All the Computer Vision example UIs will be built in the same manner.
The UI should have two columns, as shown in the following code:
<Grid.ColumnDefinitions> <ColumnDefinition Width="*" /> <ColumnDefinition Width="*" /> </Grid.ColumnDefinitions>
The first one will contain the image selection, while the second one will display our results.
In the left-hand column, we create a vertically oriented StackPanel
label. To this, we add a label and a ListBox
label. The list box will display a list of visual features that we can add to our analysis query. Note how we have a SelectionChanged
event hooked up in the ListBox
label in the following code. This will be added behind the code, and will be covered shortly:
<StackPanel Orientation="Vertical"Grid.Column="0"> <TextBlock Text="Visual Features:" FontWeight="Bold" FontSize="15" Margin="5, 5" Height="20" /> <ListBox: Name = "VisualFeatures" ItemsSource = "{Binding ImageAnalysisVm.Features}" SelectionMode = "Multiple" Height="150" Margin="5, 0, 5, 0" SelectionChanged = "VisualFeatures_SelectionChanged" />
The list box will be able to select multiple items, and the items will be gathered in the ViewModel
.
In the same stack panel, we also add a button element and an image element. These will allow us to browse for an image, show it, and analyze it. Both the Button
command and the image source are bound to the corresponding properties in the ViewModel
, as shown in the following code:
<Button Content = "Browse and analyze" Command = "{Binding ImageAnalysisVm.BrowseAndAnalyzeImageCommand}" Margin="5, 10, 5, 10" Height="20" Width="120" HorizontalAlignment="Right" /> <Image Stretch = "Uniform" Source="{Binding ImageAnalysisVm.ImageSource}" Height="280" Width="395" /> </StackPanel>
We also add another vertically oriented stack panel. This will be placed in the right-hand column. It contains a title label, as well as a textbox, bound to the analysis result in our ViewModel
, as shown in the following code:
<StackPanel Orientation= "Vertical"Grid.Column="1"> <TextBlock Text="Analysis Results:" FontWeight = "Bold" FontSize="15" Margin="5, 5" Height="20" /> <TextBox Text = "{Binding ImageAnalysisVm.AnalysisResult}" Margin="5, 0, 5, 5" Height="485" /> </StackPanel>
Next, we want to add our SelectionChanged
event handler to our code-behind. Open the ImageAnalysisView.xaml.cs
file and add the following:
private void VisualFeatures_SelectionChanged(object sender, SelectionChangedEventArgs e) { var vm = (MainViewModel) DataContext; vm.ImageAnalysisVm.SelectedFeatures.Clear();
The first line of the function will give us the current DataContext
, which is the MainViewModel
class. We access the ImageAnalysisVm
property, which is our ViewModel
, and clear the selected visual features list.
From there, we loop through the selected items from our list box. All items will be added to the SelectedFeatures
list in our ViewModel
:
foreach(VisualFeature feature in VisualFeatures.SelectedItems) { vm.ImageAnalysisVm.SelectedFeatures.Add(feature); } }
Open the ImageAnalysisViewModel.cs
file. Make sure that the class inherits the ObservableObject
class.
Declare a private
variable, as follows:
private IVisionServiceClient _visionClient;
This will be used to access the Computer Vision API, and it is initialized through the constructor.
Next, we declare a private variable and the corresponding property for our list of visual features, as follows:
private List<VisualFeature> _features=new List<VisualFeature>(); public List<VisualFeature> Features { get { return _features; } set { _features = value; RaisePropertyChangedEvent("Features"); } }
In a similar manner, create a BitmapImage
variable and property called ImageSource
. Create a list of VisualFeature
types called SelectedFeatures
and a string called AnalysisResult
.
We also need to declare the property for our button, as follows:
public ICommandBrowseAndAnalyzeImageCommand {get; private set;}
With that in place, we create our constructor, as follows:
public ImageAnalysisViewModel(IVisionServiceClientvisionClient) { _visionClient = visionClient; Initialize(); }
The constructor takes one parameter, the IVisionServiceClient
object, which we have created in our MainViewModel
file. It assigns that parameter to the variable that we created earlier. Then we call an Initialize
function, as follows:
private void Initialize() { Features = Enum.GetValues(typeof(VisualFeature)) .Cast<VisualFeature>().ToList(); BrowseAndAnalyzeImageCommand = new DelegateCommand(BrowseAndAnalyze); }
In the Initialize
function, we fetch all the values from the VisualFeature
variable of the enum
type. These values are added to the features list, which is displayed in the UI. We also created our button, and now that we have done so, we need to create the corresponding action, as follows:
private async void BrowseAndAnalyze(object obj) { var openDialog = new Microsoft.Win32.OpenFileDialog(); openDialog.Filter = "JPEG Image(*.jpg)|*.jpg"; bool? result = openDialog.ShowDialog(); if (!(bool)result) return; string filePath = openDialog.FileName; Uri fileUri = new Uri(filePath); BitmapImage image = new BitmapImage(fileUri); image.CacheOption = BitmapCacheOption.None; image.UriSource = fileUri; ImageSource = image;
The first lines of the preceding code are similar to what we did in Chapter 1, Getting Started with Microsoft Cognitive Services. We open a file browser and get the selected image.
With an image selected, we run an analyze on it, as follows:
try { using (StreamfileStream = File.OpenRead(filePath)) { AnalysisResult analysisResult = await _visionClient.AnalyzeImageAsync(fileStream, SelectedFeatures);
We call the AnalyzeImageAsync
function of our _visionClient
. This function has four overloads, all of which are quite similar. In our case, we pass on the image as a Stream
type and the SelectedFeatures
list, containing the VisualFeatures
variable to analyze.
The request parameters are as follows:
The response to this request is the AnalysisResult
string.
We then check to see if the result is null
. If it is not, we call a function to parse it and assign the result to our AnalysisResult
string, as follows:
if (analysisResult != null) AnalysisResult = PrintAnalysisResult(analysisResult);
Remember to close the try
clause and finish the method with the corresponding catch
clause.
The AnalysisResult
string contains data according to the visual features requested in the API call.
Data in the AnalysisResult
variable is described in the following table:
To retrieve data, for example for categories, you can use the following:
if (analysisResult.Description != null) { result.AppendFormat("Description: {0}\n", analysisResult.Description.Captions[0].Text); result.AppendFormat("Probability: {0}\n\n", analysisResult.Description.Captions[0].Confidence); }
A successful call would present us with the following result:
Sometimes, you may only be interested in the image description. In such cases, it is wasteful to ask for the kind of full analysis that we have just done. By calling the following function, you will get an array of descriptions:
AnalysisResultdescriptionResult = await _visionClient.DescribeAsync(ImageUrl, NumberOfDescriptions);
In this call, we have specified a URL for the image and the number of descriptions to return. The first parameter must always be included, but it may be an image upload instead of a URL. The second parameter is optional, and in cases where it is not provided, it defaults to one.
A successful query will result in an AnalysisResult
object, which is the same as the one that was described in the preceding code. In this case, it will only contain the request ID, image metadata, and an array of captions. Each caption contains an image description and the confidence of that description being correct.
We will add this form of image analysis to our smart-house application in a later chapter.
Recognizing celebrities using domain models
One of the features of the Computer Vision API is the ability to recognize domain-specific content. At the time of writing, the API only supports celebrity recognition, where it is able to recognize around 200,000 celebrities.
For this example, we choose to use an image from the internet. The UI will then need a textbox to input the URL. It will need a button to load the image and perform the domain analysis. There should be an image element to see the image and a textbox to output the result.
The corresponding ViewModel
should have two string
properties for the URL and the analysis result. It should have a BitmapImage
property for the image and an ICommand
property for our button.
Add a private
variable for the IVisionServiceClient
type at the start of the ViewModel
, as follows:
private IVisionServiceClient _visionClient;
This should be assigned in the constructor, which will take a parameter of the IVisionServiceClient
type.
As we need a URL to fetch an image from the internet, we need to initialize the Icommand
property with both an action and a predicate. The latter checks whether the URL property is set or not, as shown in the following code:
public CelebrityViewModel(IVisionServiceClient visionClient) { _visionClient = visionClient; LoadAndFindCelebrityCommand = new DelegateCommand(LoadAndFindCelebrity, CanFindCelebrity); }
The LoadAndFindCelebrity
load creates a Uri
with the given URL. Using this, it creates a BitmapImage
and assigns this to ImageSource
, the BitmapImage
property, as shown in the following code. The image should be visible in the UI:
private async void LoadAndFindCelebrity(object obj) { UrifileUri = new Uri(ImageUrl); BitmapImage image = new BitmapImage(fileUri); image.CacheOption = BitmapCacheOption.None; image.UriSource = fileUri; ImageSource = image;
We call the AnalyzeImageInDomainAsync
type with the given URL, as shown in the following code. The first parameter we pass in is the image URL. Alternatively, this could have been an image that was opened as a Stream
type:
try { AnalysisInDomainResultcelebrityResult = await _visionClient.AnalyzeImageInDomainAsync(ImageUrl, "celebrities"); if (celebrityResult != null) Celebrity = celebrityResult.Result.ToString(); }
The second parameter is the domain model name, which is in a string
format. As an alternative, we could have used a specific Model
object, which can be retrieved by calling the following:
VisionClient.ListModelsAsync();
This would return an array of Models
, which we can display and select from. As there is only one available at this time, there is no point in doing so.
The result from AnalyzeImageInDomainAsync
is an object of the AnalysisInDomainResult
type. This object will contain the request ID, metadata of the image, and the result, containing an array of celebrities. In our case, we simply output the entire result array. Each item in this array will contain the name of the celebrity, the confidence of a match, and the face rectangle in the image. Do try it in the example code provided.
Utilizing optical character recognition
For some tasks, optical character recognition (OCR) can be very useful. Say that you took a photo of a receipt. Using OCR, you can read the amount from the photo itself and have it automatically added to accounting.
OCR will detect text in images and extract machine-readable characters. It will automatically detect language. Optionally, the API will detect image orientation and correct it before reading the text.
To specify a language, you need to use the BCP-47 language code. At the time of writing, the following languages are supported: simplified Chinese, traditional Chinese, Czech, Danish, Dutch, English, Finnish, French, German, Greek, Hungarian, Italian, Japanese, Korean, Norwegian, Polish, Portuguese, Russian, Spanish, Swedish, Turkish, Arabic, Romanian, Cyrillic Serbian, Latin Serbian, and Slovak.
In the code example, the UI will have an image element. It will also have a button to load the image and detect text. The result will be printed to a textbox element.
The ViewModel
will need a string
property for the result, a BitmapImage
property for the image, and an ICommand
property for the button.
Add a private
variable to the ViewModel
for the Computer Vision API, as follows:
private IVisionServiceClient _visionClient;
The constructor should have one parameter of the IVisionServiceClient
type, which should be assigned to the preceding variable.
Create a function as a command for our button. Call it BrowseAndAnalyze
and have it accept object
as the parameter. Then, open a file browser and find an image to analyze. With the image selected, we run the OCR analysis, as follows:
using (StreamfileStream = File.OpenRead(filePath)) { OcrResultsanalysisResult = await _visionClient.RecognizeTextAsync (fileStream); if(analysisResult != null) OcrResult = PrintOcrResult(analysisResult); }
With the image opened as a Stream
type, we call the RecognizeTextAsync
method. In this case, we pass on the image as a Stream
type, but we could just as easily have passed on a URL to an image.
Two more parameters may be specified in this call. First, you can specify the language of the text. The default is unknown, which means that the API will try to detect the language automatically. Second, you can specify whether or not the API should detect the orientation of the image. The default is set to false
.
If the call succeeds, it will return data in the form of an OcrResults
object. We send this result to a function, the PrintOcrResult
function, where we will parse it and print the text, as follows:
private string PrintOcrResult(OcrResultsocrResult) { StringBuilder result = new StringBuilder(); result.AppendFormat("Language is {0}\n", ocrResult.Language); result.Append("The words are:\n\n");
First, we create a StringBuilder
object, which will hold all the text. The first content we add to it is the language of the text in the image, as follows:
foreach(var region in ocrResult.Regions) { foreach(var line in region.Lines) { foreach(var text in line.Words) { result.AppendFormat("{0} ", text.Text); } result.Append("\n"); } result.Append("\n\n"); }
The result has an array, which contains the Regions
property. Each item represents recognized text, and each region contains multiple lines. The line
variables are arrays, where each item represents recognized text. Each line contains an array of the Words
property. Each item in this array represents a recognized word.
With all the words appended to the StringBuilder
function, we return it as a string. This will then be printed in the UI, as shown in the following screenshot:
The result also contains the orientation and angle of the text. Combining this with the bounding box, also included, you can mark each word in the original image.
Generating image thumbnails
In today's world, we, as developers, have to consider different screen sizes when displaying images. The Computer Vision API offers some help with this by providing the ability to generate thumbnails.
Thumbnail generation, in itself, is not that big a deal. What makes the API clever is that it analyzes the image and determines the region of interest.
It will also generate smart cropping coordinates. This means that if the specified aspect ratio differs from the original, it will crop the image, with a focus on the interesting regions.
In the example code, the UI consists of two image elements and one button. The first image is the image in its original size. The second is for the generated thumbnail, which we specify to be 250 x 250 pixels in size.
The View
model will need the corresponding properties, two BitmapImages
methods to act as image sources, and one ICommand
property for our button command.
Define a private variable in the ViewModel
, as follows:
private IVisionServiceClient _visionClient;
This will be our API access point. The constructor should accept an IVisionServiceClient
object, which should be assigned to the preceding variable.
For the ICommand
property, we create a function, BrowseAndAnalyze
, accepting an object
parameter. We do not need to check whether we can execute the command. We will browse for an image each time.
In the BrowseAndAnalyze
function, we open a file dialog and select an image. When we have the image file path, we can generate our thumbnail, as follows:
using (StreamfileStream = File.OpenRead(filePath)) { byte[] thumbnailResult = await _visionClient.GetThumbnailAsync(fileStream, 250, 250); if(thumbnailResult != null &&thumbnailResult.Length != 0) CreateThumbnail(thumbnailResult); }
We open the image file so that we have a Stream
type. This stream is the first parameter in our call to the GetThumbnailAsync
method. The next two parameters indicate the width and height that we want for our thumbnail.
By default, the API call will use smart cropping, so we do not have to specify it. If we have a case where we do not want smart cropping, we could add a bool
variable as the fourth parameter.
If the call succeeds, we get a byte
array back. This is the image data. If it contains data, we pass it on to a new function, CreateThumbnail
, to create a BitmapImage
object from it, as follows:
private void CreateThumbnail(byte[] thumbnailResult) { try { MemoryStreamms = new MemoryStream(thumbnailResult); ms.Seek(0, SeekOrigin.Begin);
To create an image from a byte
array, we create a MemoryStream
object from it. We make sure that we start at the beginning of the array.
Next, we create a BitmapImage
object and begin to initialize it. We specify the CacheOption
and set the StreamSource
to the MemoryStream
variables we created earlier. Finally, we stop the BitmapImage
initialization and assign the image to our Thumbnail
property, as shown in the following code:
BitmapImage image = new BitmapImage(); image.BeginInit(); image.CacheOption = BitmapCacheOption.None; image.StreamSource = ms; image.EndInit(); Thumbnail = image;
Close up the try
clause and add the corresponding catch
clause. You should now be able to generate thumbnails.