Mouse Gesture and Shape Recognition with .NET

As the world fills up with devices that accept pen, touch, and mouse input, it becomes necessary to explore different ways of deriving useful meaning from more than just taps, clicks, and pinchs. One useful technique is to use gesture or shape recognition. What I mean is allowing the user to draw a continuous shape and have the computer recognize what has been drawn by matching it with a pre-defined library of shapes. You can imagine that once the computer identifies a shape, it can be given a task or action to perform upon recognition.

There are many different approaches to shape recognition, some examples include neural networks, point comparison, and even levenshtein distance. Some methods are better than others based on their complexity, accuracy, or performance. What I would like to show you is a different approach that is fast, accurate, and clocks in at under 100 lines of code.

I’ve written a Windows 10 Universal app to demonstrate the concept, and it’ll also give you a chance to follow along as I walk you through the code and then the algorithm.

Universal Recognizer

Windows 10 Universal Recognizer App

To debug and run the app, you’ll need to install the Windows 10 Universal App developer tools and Visual Studio 2015.  Once you’ve done that and are able to successfully build and run the app, you’ll need to train the software with some example shapes and gestures. You can do this by clicking the “Learn Gesture” button, as that will put the app in training mode. Once in training mode, draw any shape you want in one continuous line, and give it a corresponding name. Go ahead and add a handful of gestures so the computer has a decent assortment to choose from. When you’re done, click the button again to exit training mode.

Now that you’ve trained the software, try drawing one of the saved gestures the same way you did when you were in training mode. Did the computer recognize your gesture correctly? In my testing, anything above an 80% probability is a valid match. Try out your other saved gestures and you should notice how accurately the algorithm recognizes your saved gestures. Pretty cool, eh?

Let’s Take a Closer Look at the Algorithm

Much of the code is helper functions to assist in the recognition process, but there is a key method that I want to discuss that actually performs the analysis. The UniversalRecognizer.PointPatterns library I wrote consists of the following 5 classes.

  • Point.cs
    • Contains a single X and Y point of a gesture.
  • PointPattern.cs
    • Represents an entire gesture, including its name and an array of points.
  • PointPatternAnalyzer.cs
    • Performs the actual comparison of a gesture to a list of trained gestures.
  • PointPatternMatchResult.cs
    • Represents the result of a single gesture comparison, including name, match probability, and number of gestures compared (since you can have multiple samples of the same gesture).
  • PointPatternMath.cs
    • Contains helper functions for doing point interpolation, angle calculation, dot products, and distance calculation (for interpolation purposes not comparison).

So let’s talk about the PointPatternAnalyzer.GetPointPatternMatchResult(PointPattern compareTo, Point[] points) method. This method performs the comparison of an array of points (the incoming gesture) against a single trained gesture.

/// <summary>
/// Compares a points of a single gesture, to the points in a single saved gesture, and returns a accuracy probability.
/// </summary>
/// <param name="compareTo">Learned PointPattern from PointPatternSet to compare gesture points to.</param>
/// <param name="points">Points of the current gesture being analyzed.</param>
/// <returns>Returns the accuracy probability of the learned PointPattern to the current gesture.</returns>
public PointPatternMatchResult GetPointPatternMatchResult(PointPattern compareTo, Point[] points)
{
    // Ensure we have at least 2 points or recognition will fail as we are unable to interpolate between a single point.
    if (points.Length < 2)
        throw new ArgumentOutOfRangeException(nameof(points));

    // We'll use an array of doubles that matches the number of interpolation points to hold
    // the dot products of each angle comparison.
    var dotProducts = new double[Precision];

    // We'll need to interpolate the incoming points array and the points of the learned gesture.
    // We do this for each comparison so that we can change the precision at any time and not lose
    // or original learned gesture to multiple interpolations.
    var interpolatedCompareTo = PointPatternMath.GetInterpolatedPointArray(compareTo.Points, Precision);
    var interpolatedPointArray = PointPatternMath.GetInterpolatedPointArray(points, Precision);

    // Next we'll get an array of angles for each interpolated point in the learned and current gesture.
    // We'll get the same number of angles corresponding to the total number of interpolated points.
    var anglesCompareTo = PointPatternMath.GetPointArrayAngles(interpolatedCompareTo);
    var angles = PointPatternMath.GetPointArrayAngles(interpolatedPointArray);

    // Now that we have angles for each gesture, we'll get the dot product of every angle equal to 
    // the total number of interpolation points.
    for (var i = 0; i <= anglesCompareTo.Length - 1; i++)
        dotProducts[i] = PointPatternMath.GetDotProduct(anglesCompareTo[i], angles[i]);

    // Convert average dot product to probability since we're using the deviation
    // of the average of the dot products of every interpolated point in a gesture.
    var probability = PointPatternMath.GetProbabilityFromDotProduct(dotProducts.Average());
            
    // Return PointPatternMatchResult object that holds the results of comparison.
    return new PointPatternMatchResult(compareTo.Name, probability, 1);
}

Let’s start by walking through this method line by line and explaining what’s going on. The first two lines are just to verify that we have a gesture with at least 2 points, as we can’t interpolate a line with only a single point.

// Ensure we have at least 2 points or recognition will fail as we are unable to interpolate between a single point.
if (points.Length < 2)
    throw new ArgumentOutOfRangeException(nameof(points));

// We'll use an array of doubles that matches the number of interpolation points to hold
// the dot products of each angle comparison.
var dotProducts = new double[Precision];

The next line creates an array of the total number of interpolation points, the precision as it’s called since the more points we interpolate for a gesture, the more accurate a comparison should be (at least theoretically). However, the more points you compare for each gesture, the slower the overall comparison will be. Lets talk about interpolation for a second, when I say interpolate a gesture, what I mean is to essentially increase or reduce the number of points in a gesture to some fixed value. So for example, a simple gesture like a straight line could be as little as two points, but a more complex gesture like an M would take a minimum of 5 points. Realistically most gestures you’ll be dealing with have many more points in them since they are drawn with say a mouse or your finger.

So here is what is actually happening during the interpolation process.

Interpolation

As you can see, before interpolation the gesture contains an array of unevenly spaced points and may contain more or less than the desired precision. After interpolation, the gestures shape is preserved, but all the points are now evenly spaced and contains the desired number of points to use for comparison. What we are essentially doing is leveling the playing field when it comes to comparing two gestures. We can see the interpolation taking place on the following two lines.

// We'll need to interpolate the incoming points array and the points of the learned gesture.
// We do this for each comparison so that we can change the precision at any time and not lose
// or original learned gesture to multiple interpolations.
var interpolatedCompareTo = PointPatternMath.GetInterpolatedPointArray(compareTo.Points, Precision);
var interpolatedPointArray = PointPatternMath.GetInterpolatedPointArray(points, Precision);

After interpolation is complete, we need to take our array of points (which is essentially an array of line segments), and convert each array into an array of angles. We do this since what we really are comparing is the angular difference or dot product of the angles of each point between the two gestures. Take a look at the following code which gives us an array of angles for each gesture.

// Next we'll get an array of angles for each interpolated point in the learned and current gesture.
// We'll get the same number of angles corresponding to the total number of interpolated points.
var anglesCompareTo = PointPatternMath.GetPointArrayAngles(interpolatedCompareTo);
var angles = PointPatternMath.GetPointArrayAngles(interpolatedPointArray);

Now that we have the angles for each gesture, we can iterate over each angle in each gesture, and calculate the dot product or angular difference between the two angles. We’ll place the dot products in the array we created earlier.

// Now that we have angles for each gesture, we'll get the dot product of every angle equal to 
// the total number of interpolation points.
for (var i = 0; i <= anglesCompareTo.Length - 1; i++)
    dotProducts[i] = PointPatternMath.GetDotProduct(anglesCompareTo[i], angles[i]);

With our array of dot products in hand for each point in the gesture. It’s simply a matter of calculating the average dot product which is to say the average angular difference between the two gestures, and convert the average into a probability (from 0 to 100%). This is really all that is needed to accurately recognize the difference between two gestures. Once we have our probability, we’ll wrap it all up in a PointPatternMatchResult that contains the name of the gesture, probability, and total comparison count (which is always 1 for a single gesture).

// Convert average dot product to probability since we're using the deviation
// of the average of the dot products of every interpolated point in a gesture.
var probability = PointPatternMath.GetProbabilityFromDotProduct(dotProducts.Average());

// Return PointPatternMatchResult object that holds the results of comparison.
return new PointPatternMatchResult(compareTo.Name, probability, 1);

What I have just walked you through is the comparison of an array of points (for the current gesture) with a single gesture sample for a list of stored gesture. We need to compute a probability for every gesture in the sample set, and once we’ve done that, we can rank them by best match based on the highest probability. You can review the rest of that code on your own by downloading the attached project.

I’ve kept the demo app as simple as possible and have also isolated all the gesture recognition related code into it’s own reusable library. You’re free the use my code in any of your projects, but I would appreciate a link back to my site. I believe this method of gesture and shape recognition is extremely fast compared to the other methods available on the Internet, but please feel free to do your own benchmarks and comparisons.

I would love to hear your thoughts or ideas in the comment section below. Happy Coding!

Download the Universal Recognizer Source Code

Please follow and like us:
RSS
Follow by Email
Facebook
Google+
http://dylanvester.com/2015/10/mouse-gesture-and-shape-recognition-with-dotnet/
Twitter
SHARE

You may also like...