000 Allgemeines, Wissenschaft
Refine
Document Type
- Conference Proceeding (9)
- Article (3)
Has Fulltext
- no (12)
Is part of the Bibliography
- no (12)
We present a light-weight real-time applicable 3D-gesture recognition system on mobile devices for improved Human-Machine Interaction. We utilize time-of-flight data coming from a single sensor and implement the whole gesture recognition pipeline on two different devices outlining the potential of integrating these sensors onto mobile devices. The main components are responsible for cropping the data to the essentials, calculation of meaningful features, training and classifying via neural networks and realizing a GUI on the device. With our system we achieve recognition rates of up to 98% on a 10-gesture set with frame rates reaching 20Hz, more than sufficient for any real-time applications.
Touch versus mid-air gesture interfaces in road scenarios-measuring driver performance degradation
(2016)
We present a study aimed at comparing the degradation of the driver's performance during touch gesture vs mid-air gesture use for infotainment system control. To this end, 17 participants were asked to perform the Lane Change Test. This requires each participant to steer a vehicle in a simulated driving environment while interacting with an infotainment system via touch and mid-air gestures. The decrease in performance is measured as the deviation from an optimal baseline. This study concludes comparable deviations from the baseline for the secondary task of infotainment interaction for both interaction variants. This is significant as all participants are experienced in touch interaction, however have had no experience at all with mid-air gesture interaction, favoring mid-air gestures for the long-term scenario.
PROPRE is a generic and modular neural learning paradigm that autonomously extracts meaningful concepts of multimodal data flows driven by predictability across modalities in an unsupervised, incremental and online way. For that purpose, PROPRE consists of the combination of projection and prediction. Firstly, each data flow is topologically projected with a self-organizing map, largely inspired from the Kohonen model. Secondly, each projection is predicted by each other map activities, by mean of linear regressions. The main originality of PROPRE is the use of a simple and generic predictability measure that compares predicted and real activities for each modal stream. This measure drives the corresponding projection learning to favor the mapping of predictable stimuli across modalities at the system level (i.e. that their predictability measure overcomes some threshold). This predictability measure acts as a self-evaluation module that tends to bias the representations extracted by the system so that to improve their correlations across modalities. We already showed that this modulation mechanism is able to bootstrap representation extraction from previously learned representations with artificial multimodal data related to basic robotic behaviors [1] and improves performance of the system for classification of visual data within a supervised learning context [2]. In this article, we improve the self-evaluation module of PROPRE, by introducing a sliding threshold, and apply it to the unsupervised classification of gestures caught from two time-of-flight (ToF) cameras. In this context, we illustrate that the modulation mechanism is still useful although less efficient than purely supervised learning.
In this review, we describe current Machine Learning approaches to hand gesture recognition with depth data from time-of-flight sensors. In particular, we summarise the achievements on a line of research at the Computational Neuroscience laboratory at the Ruhr West University of Applied Sciences. Relating our results to the work of others in this field, we confirm that Convolutional Neural Networks and Long Short-Term Memory yield most reliable results. We investigated several sensor data fusion techniques in a deep learning framework and performed user studies to evaluate our system in practice. During our course of research, we gathered and published our data in a novel benchmark dataset (REHAP), containing over a million unique three-dimensional hand posture samples.
With the introduction of Apple’s iPhone, gesture control became pop-
ular and was perceived as an intuitive means of interaction. Contact-
less gestures received broad attention with the X-Box Kinect.
Current technology is limited to a small number of uses, mainly
in entertainment systems. The target of this project is to increase the
range of possible applications, e.g. to the field of automotive,
industrial applications (manufacturing plants), assisted living in con-
texts ranging from private households to hospitals (interaction for
people with disabilities) and many more.
We present a publicly available benchmark database for the problem of hand posture recognition from noisy depth data and fused RGB-D data obtained from low-cost time-of-flight (ToF) sensors. The database is the most extensive database of this kind containing over a million data samples (point clouds) recorded from 35 different individuals for ten different static hand postures. This captures a great amount of variance, due to person-related factors, but also scaling, translation and rotation are explicitly represented. Benchmark results achieved with a standard classification algorithm are computed by cross-validation both over samples and persons, the latter implying training on all persons but one and testing on the remaining one. An important result using this database is that cross-validation performance over samples (which is the standard procedure in machine learning) is systematically higher than cross-validation performance over persons, which is to our mind the true application-relevant measure of generalization performance.
We present a novel method to perform multi-class pattern classification with neural networks and test it on a challenging 3D hand gesture recognition problem. Our method consists of a standard one-against-all (OAA) classification, followed by another network layer classifying the resulting class scores, possibly augmented by the original raw input vector. This allows the network to disambiguate hard-to-separate classes as the distribution of class scores carries considerable information as well, and is in fact often used for assessing the confidence of a decision. We show that by this approach we are able to significantly boost our results, overall as well as for particular difficult cases, on the hard 10-class gesture classification task.
We present a system for 3D hand gesture recognition based on low-cost time-of-flight(ToF) sensors intended for outdoor use in automotive human-machine interaction. As signal quality is impaired compared to Kinect-type sensors, we study several ways to improve performance when a large number of gesture classes is involved. Our system fuses data coming from two ToF sensors which is used to build up a large database and subsequently train a multilayer perceptron (MLP). We demonstrate that we are able to reliably classify a set of ten hand gestures in real-time and describe the setup of the system, the utilised methods as well as possible application scenarios.