Measuring consumer sentiment is an important element of feedback for products, services, etc. In recent years, sentiment analysis—also known as opinion mining—has proven to be a useful tool in providing consumer feedback. Sentiment analysis uses text analysis and natural language processing in the context of social media. The basic idea of sentiment analysis is to capture a consumer’s opinion of a subject based on some form of communication—such as a tweet or a review from a website.
An evolution of sentiment analysis is to passively detect the mood of a consumer who passes by a shelf and looks at a product. This type of sentiment analysis allows not only the capturing of statistics about a consumer’s opinion of a product, but also the possibility of direct interaction—for example, notifying a salesperson if they express interest. In this blog, we will explore how you can use the Shopper Mood application of the Intel® OpenVINO™ toolkit to automatically infer the mood of shoppers looking at a retail display based on video input of their facial expression.
Figure 1 shows the pipeline for the Shopper Mood application. Let’s take a closer look at what’s occurs in this deep-learning application.
Figure 1: The Shopper Mood Inference Pipeline diagram illustrates how this application of the OpenVINO™ toolkit processes a captured image to identify the mood detected on a shopper’s face. (Source: Author)
The process begins by capturing an image from a video camera mounted on a retail shelf. Next, the captured image is passed into the first of two deep neural networks (based on the Convolutional Neural Network, or CNN). CNNs are one of the most popular deep-learning network architectures designed to process images. They are made up of a large number of layers that on the front end process small windows of the image and on the back end produce one or more classification scores. The first CNN determines whether faces can be detected in the captured image. If the faces found by the first network exceed a configurable probability threshold, then each face is classified as a “Shopper” and passed to the second network. The second network identifies the type of emotion shown on the face using one of five categories:
If the CNN is unable to determine the emotion of the detected face (above a configurable threshold), then it’s simply labeled as “Unknown.” You can see the result of the process overlaid on the original image in Figure 2.
Figure 2: The Shopper Mood Monitor output screen shows an example of the results of the Shopper Mood Inference Pipeline overlaid on the original captured image. (Source: Intel)
From Figure 2, you can see that the time required to detect faces in the image was 136ms, and the sentiment analysis took 13ms. This fast processing time makes it possible to do this analysis in real-time in the event an immediate response is required—such as notifying a salesperson to assist the shopper.
The sample application can also be used for non-real-time statistics, optionally sending the resulting sentiment via Message Queue Telemetry Transport (MQTT) protocol to a data analytics system for accumulation and offline analysis.
With the Intel® distribution of OpenVINO™ and approximately 600 lines of Go, you can implement facial expression detection that would have required very specialized hardware and software a decade ago. The complex work is buried within the deep-learning models that have been pre-trained for facial and mood detection. Then, the glue source loads the models and presents the captured frames to the models for processing and classification. When paired with capable hardware such as one based upon the 6th generation Intel® Core™ processor or Intel’s Neural Compute Stick 2 powered by the Intel Movidius™ X VPU, impressive inference speeds can be attained that enable real-time analytics.
Real-time detection of facial expressions has a wide range of applicable use cases. Many are commercial, such as understanding shopper sentiment, but you can also apply this solution to help people with certain types of facial recognition disorders. It is estimated that two percent of the general population suffers from developmental prosopagnosia. Developmental prosopagnosia refers to an impairment that affects recognition of people’s faces or recognition of facial expressions (expressive agnosia). This application could identify faces and facial expressions for individuals with developmental prosopagnosia.
In addition, consider applying this technology to augmented virtual reality. As more embedded devices begin to support deep learning, the possible augmented virtual reality use cases increase. For example, glasses could integrate a video camera and real-time facial detection in order to present a virtual overlay on a captured image that describes the inferred facial expression of someone who passes by the person wearing the glasses.
It’s easy to think of other applications. Using the sample code provided, you’ll just need to make use of the output classification for your application.
M. Tim Jones is a veteran embedded firmware architect with over 30 years of architecture and development experience. Tim is the author of several books and many articles across the spectrum of software and firmware development. His engineering background ranges from the development of kernels for geosynchronous spacecraft to embedded systems architecture and protocol development.
Privacy Centre |
Terms and Conditions
Copyright ©2021 Mouser Electronics, Inc.
Mouser® and Mouser Electronics® are trademarks of Mouser Electronics, Inc. in the U.S. and/or other countries.
All other trademarks are the property of their respective owners.
Corporate headquarters and logistics centre in Mansfield, Texas USA.