Handinator - A Hand Gesture Control Application

November 12th, 2020

blog-handinator-01.jpg

 

STREAMING MADE EASY!

The Handinator is a gesture control application created to improve your web-streaming experience. A wave of the hand is all it takes for you to control your video when equipped with the Handinator!

Pause, Play, Volume control, all made easy through image recognition and image processing as the hand gestures are detected from your webcam. This will render the age-old practice of reaching for buttons on your computer, obsolete and enable the new age of gesture recognition

The Inception:

Like all ideas, this too stemmed from a necessity. While browsing through Netflix one day, I realized that reaching for the pause button every time there is an interruption, especially when I am at a comfortable distance from my system, is a very frustrating process.

I wondered, given the advancement in image recognition, image processing, and machine learning, whether I can incorporate the same technology for my very own human-computer interface.

I perused the Internet for any such applications but couldn’t find any that catered to my specific needs. But, this introduced me to a few brilliant gesture control applications and projects in other domains, which served as a guiding light throughout this endeavor.

I first intended to make this application an extension for Google Chrome but realized that Chrome doesn’t allow Inline JavaScript. This prompted me to move to Python, the language with extensively powerful inbuilt libraries, and fortunately, the one I am most comfortable with too.

How does Handinator work?

While I will be delving into the technicalities of the application further down this article, a broad picture/the general flow of the program will explain the working outline fairly well.

I have used making numbers with your fingers as the controlling gesture for this application.

Ex: Make a ”1” to pause the video and a “2” to mute or unmute

  • Use the live images from the webcam as the input to the application
  • Use the frames received and with the help of the various Python libraries, dedicate a section of the image received for recognizing the hand and in turn the gesture.
  • This enables faster and more accurate recognition as the region of interest is effectively separated from the background.
  • Detect the corners of the hand to recognize the fingers that have been pulled up.
  • Make the correct prediction for the hand gesture recognized and carry out the corresponding keyboard event (Spacebar for pause and M for mute) which the user asked for.

 

Tip: Anything you do outside the region of the webcam dedicated by the application for gesturing, doesn’t hinder the recognition and cause any unwanted pauses or mutes.

kickstart

Doesn’t the use of webcams raise a security concern?

The Handinator is a perfectly secure computer interaction experience as the application is run locally on your device and not the Internet, the threat of being monitored through webcams is absent and it provides a completely safe experience.

What are the libraries used for implementing this?

Python has become one of the most popular programming languages for Machine Learning for various reasons like ease of coding, readability of the code, etc. but the most prominent one being its largely powerful and vast libraries which makes the life of a programmer much easier.

The ones I used for the Handinator are:

The OpenCV(cv2) library:

OpenCV is a library of programming functions mainly aimed at real-time computer vision i.e Image recognition, image processing, or object recognition.

The library has more than 2500 optimized algorithms, which includes a comprehensive set of both classic and state-of-the-art computer vision and machine learning algorithms. These algorithms can be used to detect and recognize faces, identify objects, classify human actions in videos, track camera movements, track moving objects, extract 3D models of objects, produce 3D point clouds from stereo cameras, stitch images together to produce a high-resolution image of an entire scene.

Fact: Just as the name suggests OpenCV is an open-source library that has millions of developers contributing tutorials, algorithms, and documentation for its betterment every day and is available to all free of cost.

For further reading and tutorials: OpenCV: OpenCV Tutorials

The keyboard library:

This library is used to gain full control of the keyboard functionalities.

It is a small library used for simulating keypresses, register hotkeys, and much more. It captures even onscreen keyboard events.

This module is used to listen and send keyboard events.

For further reading: Keyboard

The time library:

This module provides various time-related functions. The sleep() function is used here which suspends the execution of the calling thread for the given number of seconds. The argument may be a floating-point number to indicate a more precise sleep time. The actual suspension time may be less than that requested because any caught signal will terminate the sleep() following the execution of that signal’s catching routine.

This library provides many other powerful functions to deal with real-time scenarios and can be found here:

time — Time access and conversions — Python 3.9.0 documentation

 

The imutils library:

This library hosts a series of convenient functions to make basic image processing functions such as translation, rotation, resizing to fit within a threshold, skeletonization, displaying Matplotlib images, sorting contours, detecting edges, and made much easier with OpenCV.

The NumPy library:

Numpy stands for Numerical Python. It’s a powerful library for handling multidimensional arrays and matrices.

It provides:

  • a powerful N-dimensional array object
  • sophisticated (broadcasting) functions
  • tools for integrating C/C++ and Fortran code
  • useful linear algebra, Fourier transform, and random number capabilities

 

In Python, we have lists that serve the purpose of arrays, but they are slow to process.

NumPy aims to provide an array object that is up to 50x faster than the traditional Python lists. These arrays are faster as they are stored in continuous memory locations unlike lists

Further reading: NumPy Documentation

The scikit learn metrics library:

Scikit-learn is a free software machine learning library for the Python programming language. It features various classification, regression, and clustering algorithms.

Metrics: This library is used to evaluate the performance of various machine learning and provide a tangible score of the accuracy of the model used.

Pairwise distances: Compute the distance matrix from a vector array X and optional Y. This method takes either a vector array or a distance matrix and returns a distance matrix.

If the input is a vector array, the distances are computed. If the input is a distance matrix, it is returned instead.

This method provides a safe way to take a distance matrix as input while preserving compatibility with many other algorithms that take a vector array.

For the Handinator, real-time I have used Euclidean distance as the norm.

kickstart

Further reading: sklearn.metrics.pairwise.euclidean_distances — scikit-learn 0.23.2 documentation

THE SOURCE CODE:

kickstart

kickstart

kickstart

kickstart

To make the code more digestible I have provided a method by method description of it below:-

The ‘main’ function:

  • Here, the OpenCV method, cv2.VideoCapture(0) is used to obtain a real-time video stream from your webcam. This function grabs, decodes, and returns the next video frame.
  • Its argument can be either the device index or the name of a video file. The device index is just the number to specify which camera. Normally one camera will be connected (as in my case). So I simply pass 0. You can select the second camera bypassing 1 and so on. After that, you can capture the frame-by-frame.
  • The region of interest for gesture detection is specified by initializing the boundary values.
  • The read() function is used to return the frame read by the VideoCapture() method.
  • This frame is resized by calling the imutils.resize() method. This method maintains the aspect ratio of the image.
  • A copy of the obtained frame is made to preserve the original image, while the Region of Interest is segregated by limiting the frame using the boundary values initialized
  • A warning message is also raised to prevent the user from gesturing while the software is calibrating i.e. when background subtraction is taking place.

 

The run_avg function:

This function is defined for the purpose of background subtraction. Background subtraction is a technique for separating foreground elements from the background and is done by generating a foreground mask. This technique is used for detecting dynamically moving objects from static cameras.

  • Here the cv2.accumulateWeighted method is used. This uses the Running average for separation.
  • During this sequence of frames, the running average over the current frame and the previous frames is computed. This gives us the background model and any new object introduced during the sequencing of the video becomes part of the foreground.
  • We keep feeding each frame to the given function, and the function keeps finding the averages of all frames. Then we compute the absolute difference between the frames.

 

The segment function:

This function is used for thresholding. Here pixel values are assigned in accordance with the threshold provided.

  • Here, each pixel value is compared to the threshold and assigned black(0) or white(255). This is done by cv2.THRESH_BINARY
  • If a pixel value is greater than a threshold value, it is assigned one value (maybe white), else it is assigned another value (maybe black).
  • The absolute difference between the background and foreground using cv2.absdiff(). This makes segregation easier.
  • This thresholding is done by the cv2.threshold function().
  • The first argument is the source image, which should be a grayscale image. The second argument is the threshold value which is used to classify the pixel values. The third argument is the maxVal which represents the value to be given if the pixel value is more than (sometimes less than) the threshold value.
  • Now, contours are detected on the binary image using cv2.findContour(). The RETR_TREE is used to retrieve all of the contours and reconstructs a full hierarchy of nested contours.
  • This contoured image is now known as the segmented image

Now, this segmented image is returned to the main function, where contours or bounding boxes are drawn around the segmented image using cv2.DrawContours().

The count function:

To put in simple terms, this function is used to count the fingers held up by the user.

  • A convex envelope of the segmented image is obtained using cv2.convexHull.
  • The Convex Hull of a shape or a group of points is a tight-fitting convex boundary around the points or the shape.
  • This convex hull is useful in shape analysis, which is needed her for finger recognition
  • The extremities of the convex are obtained(left, right, top and bottom) and these are used to define the midpoints of X and Y planes of the image
  • The Euclidean distance is then calculated between the extremes and the midpoints and the largest such distance is stored(The finger pulled up)
  • A circular contour is now drawn to further enhance the region of interest for detection and using the circumference and radius the corresponding finger count is returned to the main function.

The count value is checked in the main function and the corresponding keyboard action is carried out.

Tip: An option of pressing ”q” to close the application is provided to further ease the usage without having to open the task manager every time.

Scope for improvement:

The program used here has a vast scope for further betterment and improvement in hand movement and computer vision code.

Here, the accuracy of detection and action relies a lot on having adequate background lighting and sometimes ends up recognizing a wrong gesture.

While the keyboard library works satisfactorily, the PyAutoGUI library can be used.

The PyAutoGUI library provides cross-platform support for managing mouse and keyboard operations through code to enable automation of such tasks.

The application can be further conditioned to more stringent and real-time conditions to push its limits and gain insights into areas of improvement.

Conclusion and future scope:

The Handinator has a very realistic hand gesture-based application and its ease of use makes it accessible to all. The same idea can further be implemented to work on mobile devices and Smart Televisions with significant contributions from developers working in the concerned domains.

Domains like gesture control, vision-based vision-based control devices, gesture interfaces, human-computer interaction have always been at the forefront of innovation. One such example is Leap Motion, which was a breakthrough in air gesture-based control devices and detecting real-time hand movement.

This idea can further be integrated into hardware projects using necessary motion sensors and make the application portable and compatible with devices on various platforms.

ABOUT THE AUTHOR

Camilla

Syed Anab Akhtar is a final year computer science undergraduate student and a passionate developer and innovator with the motto: A project a week, keeps productivity at peak. He has been a part of Corporate Gurukul for over a year now and is a former Machine Learning intern at the National University of Singapore, Computer Vision Intern at the State Bank of India, and a certified cybersecurity analyst too.

Over the past few years, Anab has worked on numerous open-source projects using machine learning, data science, cybersecurity, etc that have grabbed the attention of not just students but developers from companies like Microsoft, Google, and even potential investors too. He is now preparing for his masters in computer science and all set to research further breakthrough technologies in the field.

0 likes
Average: 5 (4 votes)

Alumni Speaks

It started off in a more hectic manner than I could expect. ... read more

- Priyanshi Somani, Manipal Institute of Technology

“GAIP is perfectly aligned with someone's goal who wishes to experience an outburst of academic challenges while working on projec ... read more

- Sukriti Shaw, SRM Institute of Science and Technology

“Combining different characters and skillset from different institutes and domains in a new country and fantastic institute, it wa ... read more

- Shaolin Kataria, VIT, Vellore

“An enriching and enthralling experience. The course was extensive but worth every penny. ... read more

- Arudhra Narasimhan V, SASTRA DEEMED TO BE UNIVERSITY

“I personally learned quite a bit here but the 6-month project or LOR aren't as easy to get as was portrayed before. ... read more

- Dwait Bhatt, BITS PILANI

“It was a great experience for me, and far beyond my expectations. ... read more

- Shrikant Tarwani, LNM Institute of Information Technology

“This Internship is the perfect balance of theory and practical application. ... read more

- Mahima Borah, Manipal Institute of Technology

“This Internship has strengthened my concepts on Artificial Intelligence and Deep learning which are the hot words of today’s t ... read more

- Mansi Agarwal, Delhi Technological University

Please login to post comment, like the blog and its associated comments as well