In this contribution we present a novel approach to transform data from time-of-flight (ToF) sensors to be interpretable by Convolutional Neural Networks (CNNs). As ToF data tends to be overly noisy depending on various factors such as illumination, reflection coefficient and distance, the need for a robust algorithmic approach becomes evident. By spanning a three-dimensional grid of fixed size around each point cloud we are able to transform three-dimensional input to become processable by CNNs. This simple and effective neighborhood-preserving methodology demonstrates that CNNs are indeed able to extract the relevant information and learn a set of filters, enabling them to differentiate a complex set of ten different gestures obtained from 20 different individuals and containing 600.000 samples overall. Our 20-fold cross-validation shows the generalization performance of the network, achieving an accuracy of up to 98.5% on validation sets comprising 20.000 data samples. The real-time applicability of our system is demonstrated via an interactive validation on an infotainment system running with up to 40fps on an iPad in the vehicle interior.
Touch versus mid-air gesture interfaces in road scenarios-measuring driver performance degradation
(2016)
We present a study aimed at comparing the degradation of the driver's performance during touch gesture vs mid-air gesture use for infotainment system control. To this end, 17 participants were asked to perform the Lane Change Test. This requires each participant to steer a vehicle in a simulated driving environment while interacting with an infotainment system via touch and mid-air gestures. The decrease in performance is measured as the deviation from an optimal baseline. This study concludes comparable deviations from the baseline for the secondary task of infotainment interaction for both interaction variants. This is significant as all participants are experienced in touch interaction, however have had no experience at all with mid-air gesture interaction, favoring mid-air gestures for the long-term scenario.
Given the success of convolutional neural networks (CNNs) during recent years in numerous object recognition tasks, it seems logical to further extend their applicability to the treatment of three-dimensional data such as point clouds provided by depth sensors. To this end, we present an approach exploiting the CNN’s ability of automated feature generation and combine it with a novel 3D feature computation technique, preserving local information contained in the data. Experiments are conducted on a large data set of 600.000 samples of hand postures obtained via ToF (time-of-flight) sensors from 20 different persons, after an extensive parameter search in order to optimize network structure. Generalization performance, measured by a leave-one-person-out scheme, exceeds that of any other method presented for this specific task, bringing the error for some persons down to 1.5 %.
This contribution presents a novel approach of utilizing Time-of-Flight (ToF) technology for mid-air hand gesture recognition on mobile devices. ToF sensors are capable of providing depth data at high frame rates independent of illumination making any kind of application possible for in- and outdoor situations. This comes at the cost of precision regarding depth measurements and comparatively low lateral resolution. We present a novel feature generation technique based on a rasterization of the point clouds which
realizes fixed-sized input making Deep Learning approaches applicable using Convolutional Neural Networks. In order to increase precision we introduce several methods to reduce noise and normalize the input to overcome difficulties in scaling. Backed by a large-scale database of about half
a million data samples taken from different individuals our
contribution shows how hand gesture recognition is realiz-
able on commodity tablets in real-time at frame rates of up to 17Hz. A leave-one out cross-validation experiment
demonstrates the feasibility of our approach with classification errors as low as 1,5% achieved persons unknown to the model.