In this blog I’m going to show one of the advantages of linking Data Lakes Analytics with Machine Learning.
We’ll be uploading a series of images to the Data Lake, we will then run a USQL script that will detect objects in the images and create relative tags in a text file.
First of all you need an instance of Data Lake Store and one of Data Lake Analytics, once these are up and running we need to enable Python/R/Cognitive in your Data Lake Analytics instance (here is a blog to help you out on this).
First things first, we need to put an image in our Data Lake Store, following Azure Data Lake best practices I put the images in my laboratory subfolder.
Once our images are in place we need to create a script, in your Data Lake analytics instance click on New Job.
This will open a new blade with an empty script, let’s give our new Job a name “ImageTagging”.
In order to use Image tagging we need to import the relevant ASSEMBLIES:
REFERENCE ASSEMBLY ImageCommon; REFERENCE ASSEMBLY ImageTagging;
Next we need to extract information (location, filename etc.) on the image file(s) we want to analyse, in this case we’ll process all images in the specified folder.
@images= EXTRACT FileName string, ImgData byte[] FROM @"/Laboratory/Desks/CSbrescia/ImageTagging/{FileName:*}.jpg" USING new Cognition.Vision.ImageExtractor();
The following step is where the magic happens, the script analyses all the images located in the folder indicated before, it detects all objects present in each image and create tags; here is the structure of this “variable”:
- Image name
- Number of tagged objects detected
- A string with all the tags
@TaggedObjects= PROCESS @images PRODUCE FileName, NumObjects int, Tags string READONLY FileName USING new Cognition.Vision.ImageTagger();
Now we can write our variable with all the tags to an output file
OUTPUT @TaggedObjects
TO “/Laboratory/Desks/CSbrescia/ImageTagging/ImageTags.tsv”
USING Outputters.Tsv();
Here are the images I used in this example
And here is the list of objects detected
In conclusion, we have created a pretty handy tool for automatic image tagging using Data Lake with very little knowledge required on the background processes involved.
To be noted that there seems to be an image size limit, i had to resize all images to about 500 kb.