Chapter 2 Learning Process 2.1 Motivation A variety of spontaneous gestures, such as nger, hand, body, or head movements are used to convey information in interactions among people. Gestures can hence be considered a natural communication channel with numerous aspects to be utilized in human computer interaction. Up to date, most of our interactions with computers are performed with traditional keyboards, mouses, and remote controls designed mainly for stationary interaction. With the help of the great technological advancement, gesture-based interfaces can serve as an alternate modality for controlling computers, e.g. to navigate in oce applications or to play some console games like Nintendo Wii. Gesture-based interfaces can enrich and diversify interaction options and provide easy means to interact with the surrounding environment especially for handicapped people who are unable to live their lives in a traditional way. On the other hand, mobile devices, such as PDAs, mobile phones, and other portable personal electronic devices provide new possibilities for interacting with various applications, if equipped with the necessary devices especially with the proliferation of low-cost MEMS (Micro-Electro-Mechanical Systems) technology. The majority of the new generation of smart phones, PDAs, and personal electronic devices are embedded with an accelerometer for various applications. Small wireless devices containing accelerometers could be integrated into clothing, wristwatches, or other personal electronic devices to provide a means for interacting with dierent environments. By dening some simple gestures, these devices could be used to control home appliances for example, or the simple up and down hand movement could be used to operate a garage door, or adjust the light intensity in a room or an oce.
3
2.2
Gestures and Gesture Recognition
Expressive and meaningful body motions involving physical movements of the hands, arms, or face can be extremely useful for • conveying meaningful information, or • interacting with the environment.
This involves: • a posture: a static conguration without the movement of the body part and • a gesture: a dynamic movement of the body part.
Generally, there exist many-to-one mappings from concepts to gestures and vice versa. Hence gestures are ambiguous and incompletely specied. For example, the concept "stop" can be indicated as a raised hand with the palm facing forward, or an exaggerated waving of both hands over the head. Similar to speech and handwriting, gestures vary between individuals, and even for the same individual between dierent instances. Sometimes a gesture is also aected by the context of preceding as well as following gestures. Moreover, gestures are often language- and culture-specic. They can broadly be of the following types : 1. hand and arm gestures: recognition of hand poses, sign languages, and entertainment applications (allowing children to play and interact in virtual environments). 2. head and face gestures: Some examples are (a) nodding or head shaking (b) direction of eye gaze (c) raising the eyebrows (d) opening and closing the mouth (e) winking (f) aring the nostrils (g) looks of surprise, happiness, disgust, fear, sadness, and many others represent head and face gestures. 3. body gestures: involvement of full body motion, as in (a) tracking movements of two people having a conversation (b) analyzing movements of a dancer against the music being played and the rhythm 4
(c) recognizing human gaits for medical rehabilitation and athletic training. Gesture recognition refers to the process of understanding and classifying meaningful movements of the hands, arms, face, or sometimes head, however hand gestures are the most expressive, natural, intuitive and thus, most frequently used. Gesture recognition has become one of the hottest elds of research for its great signicance in deg articially intelligent human-computer interfaces for various applications which range from sign language through medical rehabilitation to virtual reality. More specically, gesture recognition can be extremely useful for: 1. Sign language recognition in order to develop aids for the hearing impaired. For example, just as speech recognition can transcribe speech to text, some gestures representing symbols through sign language can be transcribed into text. 2. Socially assistive robotics. By using proper sensors and devices, like accelerometers and gyros, worn on the body of a patient and by reading the values from those sensors, robots can assist in patient rehabilitation. 3. Developing alternative computer interfaces. Foregoing the traditional keyboard and mouse setup to interact with a computer, gesture recognition can allow s to accomplish frequent or common tasks using hand gestures to a camera. 4. Interactive game technology. Gestures can be used to control interactions within video games providing players with an incredible sense of immersion in the totally engrossing environment of the game. 5. Remote Controlling. Through the use of gesture recognition, various hand gestures can be used to control dierent devices, like secondary devices in a car, TV set, operating a garage door, and many others. Gesture recognition consists of gesture spotting that implies determining the start and the end points of a meaningful gesture trace from a continuous stream of input signals, and, subsequently, segmenting the relevant gesture. This task is very dicult due to two main reasons. First of all, the segmentation ambiguity in the sense that as the hand motion switches from one gesture to another, there occur intermediate movements as well. These transition motions are also likely to be segmented and matched with template traces, and need to be eliminated by the model. The spatio-temporal variability is the second reason since the same gesture may vary dynamically in duration and, very possibly in shape even for the same gesturer.
5
2.3 Literature Survey The rst step in recognizing gestures is sensing the human body position, conguration (angles and rotations), and movement (velocities or accelerations). This can be done either by using sensing devices attached to the which can take the form of magnetic eld-trackers, instrumental (colored) gloves, and body suits or by using cameras and computer vision techniques. Each sensing technology varies along several dimensions, including accuracy, resolution, latency, range of motion, comfort and cost. GLOVE BASED gestural interfaces typically require the to wear a cumbersome device and carry a load of cables connecting the device to the computer. This hinders the ease and naturalness of the 's interaction with the computer. VISION BASED techniques, while overcoming this problem, need to contend with other problems related to occlusion of parts of the 's body. Most of the work on gesture recognition available in the literature is based on computer vision techniques . Vision-based techniques vary among themselves in 1. The number of cameras used, 2. Their speed and latency, 3. The structure of environment such as lighting and speed of movement, 4. Any requirements such as any restrictions on clothing, 5. The low-level features used such as edges, regions, silhouettes, moments, histograms, and others, 6. Whether 2D or 3D is used. Therefore, these limitations restrict the applications of vision-based systems in smart environments. More specically, suppose you are enjoying watching movies in your home theatre with all the lights o. If you decide to change the volume of the TV with a gesture, it turns out to be rather dicult to recognize your gesture under poor lighting conditions using a vision-based system. Furthermore, it would be extremely uncomfortable and unnatural if you have to be directly facing the camera to complete a gesture. In IMAGE PROCESSING TECHNIQUE for gesture recognition ,the input image ( hand gesture image) is divided into regions separated by boundaries. The segmentation process depends on the type of gesture, if it is dynamic gesture then the hand gesture need to be located and tracked, if it is static gesture (posture) the input image have to be segmented only. The hand should be located rstly, generally a bounding box is used to specify the depending on the skin colorand secondly, the hand have to be tracked.For tracking the hand there are two main approaches: 6
Figure 2.1: Gesture recognized using image processing 1. Either the video is divided into frames and each frame have to be processed alone, in this case the hand frame is treated as a posture and segmented or 2. Using some tracking information such as shape, skin color using some tools such as Kalman lter. Dierent tools and methods used skin and non-skin pixels to model the hand. These methods are parametric and non-parametric techniques, Gaussian Model (GM) and Gaussian Mixture Model (GMM) are parametric techniques, and histogram based techniques are non- parametric. However it is aected with illumination condition changes abs dierent races. Although these systems can detect dierent skin colors under cluttered background but it is aected with changing in temperature degrees besides their expensive cost. The segmentation considered as an open issue problem itself. The color space used in a specic application plays an essential role in the success of segmentation process, however color spaces are sensitive to lighting changes, for this reason, researches tend to use chrominance components only and neglect the luminance components such as r-g, and HS color spaces. However there are some factors that obstacle the segmentation process which is complex background, illumination changes, low video quality. Edge detection or contour operators cannot be used for gesture recognition since many hand postures are generated and could produce misclassication. FINGERTIP DETECTION Most of the complete hand interactive systems can be considered to be comprised of three layers: detection, tracking and recognition. The detection layer is responsible for dening and extracting visual features that can be attributed to the presence of hands in the eld of view of the camera(s). The tracking layer is responsible for performing temporal data association between successive image frames, so that, at each moment in time, the system may be aware of what is where. Moreover, in model-based methods, tracking also provides a way to maintain estimates of model parameters, variables and features that are not directly observable at a certain moment in time. Last, the recognition layer is responsible for grouping the spatiotemporal data 7
Figure 2.2: Gesture Recognized Using Finger Tip Detection
Figure 2.3: Twelve Gestures In A Data Set extracted in the previous layers and asg the resulting groups with labels associated to particular classes of gestures. The problem of ecient and robust nger tip-based recognition of natural hand gestures in unprepared environments still remains open and challenging, and is expected to remain of central importance to the computer vision community in the forthcoming years. A very promising alternative is to resort to other sensing techniques such as acceleration based techniques or electromyogram-based (EMG-based) techniques. Accelerationbased gesture control is well-suited to distinguish noticeable, larger scale gestures with dierent hand trajectories. However, it is not very eective when it comes to detecting more subtle nger movements which is completely overcome by electromyogram-based techniques since they are very sensitive to muscle activation and thus provide rich information about nger movements. Yet, due to some inherent problems with EMG measurements including separability and reproducibility of measurements, the size of discriminable hand gesture set is limited to 4-8 gestures. As a result, after examining all the sensing devices and techniques in literature, a 3-axis accelerometer is the sensing device utilized to acquire the data pertaining to gestures. Gesture recognition based on data from an accelerometer is an emerging technique for gesture-based interaction after the rapid development of the MEMS technology. Accelerometers are embedded in most of the new generation personal electronic devices such as Apple iPhone , Nintendo wii mote which provide new possibilities for interaction in a wide range of applications, such as home appliances, in oces, and in video games.
2.3.1 Accelerometer An accelerometer is a sensing element that measures acceleration; acceleration is the rate of change of velocity with respect to time. It is a vector that has magnitude and direction. Accelerometers measure in units of g - A g is the acceleration measurement for gravity 8
Figure 2.4: Various Gestures Along 3 axes which is equal to 9.81m/s. Accelerometers have developed from a simple water tube with an air bubble that showed the direction of the acceleration to an integrated circuit that can be placed on a circuit board. Accelerometers can measure: vibrations, shocks, tilt, impacts and motion of an object. 2.3.1.1
Types of Accelerometers
There are a number of types of accelerometers. What dierentiates the types is the sensing element and the principles of their operation. • Capacitive accelerometers sense a change in electrical capacitance, with respect to
acceleration. The accelerometer senses the capacitance change between a static condition and the dynamic state. • Piezoelectric accelerometers use materials such as crystals, which generate electric
potential from an applied stress. This is known as the piezoelectric eect. As stress is applied, such as acceleration, an electrical charge is created. • Piezoresistive accelerometers (strain gauge accelerometers) work by measuring the
electrical resistance of a material when mechanical stress is applied • Hall Eect accelerometers measure voltage variations stemming from a change in
the magnetic eld around the accelerometer. • Magnetoresistive accelerometers work by measuring changes in resistance due to a
magnetic eld. The structure and function is similar to a Hall Eect accelerom9
eter except that instead of measuring voltage, the magnetoresistive accelerometer measures resistance. • Heat transfer accelerometers measure internal changes in heat transfer due to ac-
celeration. A single heat source is centered in a substrate and suspended across a cavity. Thermoresistors are spaced equally on all four sides of the suspended heat source. Under zero acceleration the heat gradient will be symmetrical. Acceleration in any direction causes the heat gradient to become asymmetrical due to convection heat transfer. MEMS-Based Accelerometers MEMS (Micro-Electro Mechanical System) technology is based on a number of tools and methodologies, which are used to form small structures with dimensions in the micrometer scale (one millionth of a meter). This technology is now being utilized to manufacture state of the art MEMS-Based Accelerometers. 2.3.1.2
Selecting an Accelerometer
When selecting an accelerometer for an application the rst factors to consider are: • Dynamic Range: Dynamic range is the +/- maximum amplitude that the accelerom-
eter can measure before distorting or clipping the output signal. Dynamic range is typically specied in g's • Sensitivity: Sensitivity is the scale factor of a sensor or system, measured in
of change in output signal per change in input measured. Sensitivity references the accelerometer's ability to detect motion. Accelerometer sensitivity is typically specied in millivolt per (mV/g). • Frequency response: Frequency response is the frequency range for which the sensor
will detect motion and report a true output. Frequency response is typically specied as a range measured in Hertz (Hz). • Sensitive axis: Accelerometers are designed to detect inputs in reference to an
axis; single-axis accelerometers can only detect inputs along one plane. Tri-axis accelerometers can detect inputs in any plane and are required for most applications. • Size and Mass: Size and mass of an accelerometer can change the characteristics
of the object being tested. The mass of the accelerometers should be signicantly smaller than the mass of the system to be monitored.
10
Chapter 3 Gesture Recognition and Methods of Pattern Recognition 3.1 Gesture Recognition Automated gesture recognition has been investigated in various academic research projects, which yielded a number of practical applications and commercial products. Prior to the actual recognition of gestures, a solution must be found to automatically the position or movements of the human body or body parts, such as arms and hands. This is sometimes also referred to as motion tracking and a variety of dierent techniques have been investigated concerning this task. An accelerometer is a small sensor, which can measure the acceleration of itself, or the device it is built-in respectively. Based on the acceleration prole, which originates from the movement of the device, the classication into previously dened gestures is possible.
3.2 Pattern Recognition Methods The recognition of accelerometer-based hand gestures is primarily a task of pattern recognition. As the accelerometer can be freely moved in space temporally varying, 3dimensional signal data is obtained from the 3-axis acceleration sensor. The sample data generated in this way, needs to be classied into a set of preliminary dened reference gestures. Gestures should be freely trainable, with minimal training eort. Data samples that do not match one of the predened gestures well enough, should not be classied as one of them, but instead stated separately, as an unclassiable, unknown gesture. Dynamic time warping algorithms can measure the similarity between two sample sequences, which vary in time or speed. Good classication results can be achieved by using this method. The major drawback however is that, in order for the method to work reliably, for each gesture, a relatively high number of gesture data templates needs to 11
be present. Therefore this method is only used for small alphabets of gestures; as the creation of good gesture templates is described as a relatively tedious process. Neural networks have also been applied to similar tasks of classication, yielding good results. However, neural networks are generally not suitable to work on multi-dimensional sample data that varies over time. Their major aw is that they typically operate with a xed length time window, which imposes restrictions on the gestures. Extensive pre- and post-processing steps, such as the inter- and extrapolation of data samples, need to be employed in combination with special types of neural networks, such as timedelay neural networks, in order to achieve good quality classication results. The third pattern recognition method, which was considered and eventually chosen, are so-called hidden Markov models (HMMs). They have been extensively studied primarily in the context of speech recognition. Hidden Markov models work well on temporal data that is not xed in length. A number of training algorithms exists, allowing the automated creation of custom, free-form gestures models. They can be used to solve the segmentation problem, which typically occurs with continuous temporal data. For their particular properties and the promising results, achieved in comparable studies utilising them, discrete hidden Markov models were chosen to form the basis of the pattern recognition process.
3.3 Recognition Methods Hidden Markov models represent the core of the gesture recognition application, that is developed in the course of this project. To be applied in a practical application, some pre- and post-processing steps need to be applied, in order to bring the input data into a form, which can be easily processed by discrete hidden Markov models. The optimisation achieved by this additional processing steps also helps minimise computational and memory demands.
3.4 Sensing Acceleration of the device (relative to free-fall) is constantly measured by MMA73X1 accelerometer. The acceleration is measured along three axis, resulting in a three-dimensional data signal.
3.5 Vector Quantisation Vector quantisation is a process of mapping a large input vector space onto a smaller subset of so-called prototype vectors. This can also be described in of cluster analysis: The original input data vectors are divided into a xed number of clusters. For each cluster, 12
there is one vector, which is most representative for its cluster, the so-called prototype vector. The vector quantisation process `converts' the large number three-dimensional acceleration vectors into a manageable amount of prototype vectors. Now each of the prototype vectors, a symbol is assigned, resulting in a discrete set of symbols. In this case, simply the integer indices from the list of prototype vectors serve as the observation symbols. This pre-processing step is needed in order to generate a manageable amount of observation symbols, that can be processed eciently by discrete hidden Markov models.
3.6 HMM Trainer The hidden Markov model training module is responsible for creating a hidden Markov model for each gesture, that should be in the recognition pool. This gesture model is then trained, or optimised, based on multiple training sequences. Training sequences are created by recording several repetitions of the desired hand gesture (vector quantisation steps are applied too).
3.7 HMM Recogniser The HMM recognition module is responsible for assessing how well the input data, obtained from a gesture just performed, matches the available gesture models. The pool of available gesture HMMs is established through the HMM trainer module. Speaking 13
in HMM , the HMM recogniser calculates the probability that the observation sequence was generated by a given hidden markov model. For each of the previously learned gesture models, this probability distribution is computed.
3.8 Bayes Classication Unfortunately, the probability distributions obtained from the HMM recogniser can not be directly used for the nal classication of the input data. This is because the maximum probabilities of each of the gesture models are extremely diverging. By applying Bayes' rule in combination with the average probabilities of the training sequences, normalised probability values are generated. Finally, the gesture HMM with the highest probability after Bayes' rule, is returned as the recognised gesture.
3.9 Vector Quantisation The purpose of a so-called vector quantiser is to map vectors from a large input data space onto a nite set of typical reproduction vectors. These reproduction or prototype vectors are a small subset of the original vector space. During this mapping, ideally no information should be lost, but at the same time the complexity of the data representation should be signicantly reduced. When processing signal data with a digital computer, these digital representations are already always nite. However, this data space is still very large, and it needs to be broken down further, before it can be processed with discrete Hidden Markov Models in an ecient manner. The reduction in complexity is achieved by grouping similar vectors together. Each input vector is mapped onto a corresponding prototype vector of a given codebook. This mapping is done according to the nearest-neighbour condition. A vector quantiser can also be seen as a method of cluster analysis. There are two problems to be solved regarding vector quantisers. Firstly, the number of prototype vectors needs to be decided upon. This number is an essential parameter for characterising a vector quantiser, and it has major inuence on its quality. The second problem concerns the selection of prototype vectors from the input data space. The set of prototype vectors is also referred to as a codebook, and the number of prototype vectors is the codebook size. The rst problem about the codebook size needs to be handled very much application dependent. There is no general algorithm to determine this number. Depending on the sample data, this is a trade o between minimising the quantisation error (larger codebook size) and maximising the reduction in complexity. For solving the second problem of nding appropriate prototype vectors, several algorithms exist. None of them can nd an optimal solution for a given set of sample data and a given codebook size. For this project the so-called Lloyd's algorithm was used. 14
3.10 Denition A vector quantiser Q is dened as a mapping of a k-dimensional1 vector space Rk onto a nite subset Y ⊂ Rk Q : Rk −→ Y
Let the codebook be dened as the set Y = y1 , y2 , ...yN . The prototype vectors yi are sometimes also referred to as reproduction vectors or as code words. N denotes the size of the codebook. As a vector quantiser Q of size N maps every vector x form the input space to exactly one prototype vector yi , it denes a complete and dist partition of Rk into regions or cells R1 , R2 , ...RN . In the cell Ri lie all those vectors x ∈ Rk , which were mapped to the prototype vector or code word yi . N S
Ri = Rk
i=0
and
Ri ∩ Rj = φ, ∀i,j with i 6= j
The so-called nearest-neighbour condition describes the optimal selection of the partition {Ri } for a given codebook Y . Every partition cell Ri needs to be determined so that it contains all vectors x ∈ Rk , which have minimal distance from the associated prototype vector yi . This means that the corresponding vector quantiser Q maps a sample vector x onto its nearest prototype vector of the codebook. Q(x) = yi
if
d(x, yi ) ≤ d(x, yj )∀j 6= i
In the above equation d(:, :) denotes a general distance function. It can be substituted with a specic distance function. In this paper, the Euclidean distance in three dimensional vector space was used: d(x, y) = ||y − x|| =
q
(y1 − x1 )2 + (y2 − x2 )2 + (y3 − x3 )2
If the distance between vector x has minimal distance to more than only one prototype vectors, it can be mapped to any one of those prototype vectors. The resulting quantisation error is not aected by the actual choice. The optimal choice of a codebook Y for a given partition Ri is dened by the centroid condition. For a cell Ri of the partition the optimal reproduction vector is given by that prototype vector yi , that corresponds to the centroid of the cell: yi = cent(Ri )
The centroid of a cell R here is dened as the vector y ∗ ∈ R from that all other vectors x ∈ R in the statistical average have minimal distance. y ∗ = cent(R) = arg min {d(X, y)|X ∈ R} y∈R
15
The random variable X is used here to represent the distribution of the data vectors x in the input space. A sample vector x ∈ Rk that was mapped onto a prototype vector y will in general be dierent from the source vector. Therefore from every single quantisation operation results an individual quantisation error (x|Q). This error can be described by the distance between the original and the quantised vector: (x|Q) = d(x, Q(x))
when using Euclidean distance: (x : Q) = ||x − Q(x)||
Finally, we also introduce index I = {1, 2, ...N } which denotes the index of codebook Y . Later, it will be shown how this index serves as the alphabet of possible observation symbols for discrete hidden Markov models.
3.11 Lloyd's Algorithm In the previous section, the elements of a vector quantiser and optimality conditions, such as the nearest neighbour rule were introduced. However, what was previously referred to as the second problem vector quantiser, has not been answered yet: Given a sample set of input vectors ω = {x1 , x2 ...xT } and the codebook size N - how can N optimal prototype vectors be determined, which represent the codebook Y ? There are various algorithms that can nd a near-optimal solution to this problem. The one, that is used in this project, is called Lloyd's algorithm, sometimes also called the Voronoi iteration. This algorithm is often referred to by mistake as k-means algorithm. The Lloyds algorithm is an iterative algorithm, achieving a better solution at each iteration. At the beginning of the procedure, N suitable prototype vectors are chosen. This initial codebook is referred to as Y 0 . The choice of the initial codebook inuences the quality of the vector quantiser. It needs, however, be done heuristically. A common approach is to randomly select N vectors from the input vectors. In the corresponding implementation section on vector quantisation, other methods are evaluated as well. The next step is to map every vector xt from the sample set to the nearest prototype vector yim of the current codebook Y m . This mapping also determines the partitions Rim . From the newly formed partitions, an updated codebook can be computed. The centroids of the new partitions serve as the prototype vectors for the updated codebook. Since Euclidean distance is used for the centroid condition. The centroid for a given partition can simply be calculated as the mean vector of that partition. The algorithm terminates, when the relative decrease of the quantisation error is less 16
than the specied lower bound ∆min . Given
• sample set of input vectors ω = (x1 , x2 , ...., xT ) • desired codebook size N • lower bound ∆min . for the relative improvement of the quantisation error
1. Initialisation • Choose a suitable initial codebook Y 0 of size N • set iteration count m←− 0
2. Partitioning For the current codebook Y m , determine the optimal partition by classifying all vectors xt with t = 1...T into cells Rim = {x|yim = arg minm d(x, y)} y∈Y
also compute the average quantisation error 0 (Y m ) =
T 1X min d(xt , y) T t=1 y∈Y m
3. Codebook Update For all cells Rim with i = 1...N calculate new reproduction vectors yim+1 = cent(Rim )
these constitute the new codebook Y m+1 = {yim+1 |1 ≤ i ≤ N } 4. Termination calculate the relative decrease of the quantisation error with respect to the last iteration ∆m =
0 (Y m−1 −0 (Y m ) 0 (Y m )
if the relative decrease was large enough: ∆m > ∆min set m ←− m + 1 and continue with step 2.
otherwise stop.
17
3.12 Hidden Markov Models Hidden Markov models can be divided into two main types: discrete hidden Markov models and continuos hidden Markov models. There are also hybrid types, such as semicontinuous hidden Markov models. Discrete Markov models operate with probability distributions of symbols form a nite alphabet of symbols. Continuous hidden Markov models operate with probability densities on continuous data, represented as mixture models. Here, only the discrete models are examined. Therefore, the following theory section, including mathematical formulas, only applies for discrete hidden Markov models.
3.12.1 Markov Chains Hidden Markov models are fundamentally based on Markov chain models. Markov chains can be used for the statistical description of symbol and state sequences. A discrete Markov chain model can be visualised as a nite state automaton with connections between all states. The nite set of N distinct states of the system is dened as: S = {1, 2, ...N }
At discrete time steps, the system undergoes a change of state. The change of state happens according to the set of state transition probabilities, these can be thought of as the connections between the state nodes. Because there may be connections pointing back to the same state (a11 , a22 , a33 ) in gure, it is possible that the system ends up in the same state. The time instants associated with the state changes are denoted as t = 1, 2, ....T (for a nite time interval of length T). The actual state of the system at time t can be denoted as St . The behaviour of the system at a given time t is only dependent on the previous state in time. This is the so-called Markov property and therefore the probabilistic description of the system's states can be characterised as follows: P (St |S1 , S2 , ...St−1 ) = P (St |St−1 )
The set of state transition probabilities is denoted as the matrix A: A = {aij } aij = P (St = j|St−1 = i),
1 ≤ i, j ≤ N
It has not yet been dened how the initial state of the system is determined. Therefore the vector π is introduced. It holds the set of start probabilities for each state: π = {πi |πi = P (S1 = i)}
18
Figure 3.1: A markov chain with 3 states The state transition matrix A and the start probability vector πi obey the following standard stochastic constraints: N X
aij = 1
j=1 N X
πi = 1
i=1
Expressed verbosely, this means that the state probabilities for each state in the model (each row of the matrix) must sum up to 1. And the same applies for the start probabilities. Now that all fundamental elements of a Markov chain have been introduced, the concrete example of a 3-state system is presented. The example of a 3-state Markov model of the weather shall be considered. Therefore, it is assumed that once a day (e.g., at noon), the weather is observed as being one of the following: State 1: rain State 2: cloudy State 3: sunny The weather on day t can only be a single one of the three states above. The matrix A of the state transition probabilities is given as:
19
A = {aij} =
a11 a12 a13 a21 a22 a23 a31 a32 a33
=
0.4 0.3 0.3 0.2 0.6 0.2 0.1 0.1 0.8
The current weather shall be sunny (state 3), therefore the start vector π of the model looks as follows: π = {πi } =
π1 π2 π3
=
0.0 0.0 1.0
Now the question is asked: What is the probability (according to the given model) that the weather for the next 7 days will be ”sunny − sunny − rainy − rainy − sunny − cloud−sunny”?. To be able to pose this question more formally, the observation sequence O is dened: O = {sun, sun, rain, rain, sun, clouds, sun} = {S1 = 3, S2 = 3, S3 = 1, S4 = 1, S5 = 3, S6 = 2, S7 = 3} = {3, 3, 1, 1, 3, 2, 3} Note that the symbols {1, 2, 3} refer to the states of the 3-state Markov chain.
The question about the probability can be expressed and evaluated for the given model and observation sequence as: P(O|Model) =P(3, 3, 1, 1, 3, 2, 3|Model) =P(3).P(3|3).P(3|3).P(1|3).P(1|1).P(3|1).P(2|3).P(3|2) =π3 .a33 .a33 .a31 .a11 .a13 .a32 .a23 =1.0 * 0.8 * 0.8 * 0.1 * 0.4 * 0.3 * 0.1 * 0.2 =1.536 ×10−4
3.13 Hidden Markov Model 3.13.1 Hidden Markov Model Denition Hidden Markov models describe a two-stage stochastic process. The rst stage is exactly a Markov chain as dened in the previous section. Therefore hidden Markov models can be seen as extended versions of Markov chain models. In an HMM, a Markov chain represents the so-called hidden part of the model. It is called hidden, because in an HMM, the individual states do not directly correspond to some observable event and are therefore hidden. Instead, the observable event is itself another probabilistic function. It is a probabilistic function that depends (and only depends) on the current state. This probabilistic function represents the second stage of the HMM process. In the example 3-state weather model discussed in the previous section, each of the 20
Figure 3.2: HMM with (rain,clouds,sun,fog,snow)
3
hidden
states
and
observation
alphabet
three states corresponds directly to an observable meaningful event, namely to a rainy, cloudy or sunny day. In contrast, the states of an HMM do not have any obvious meaning, they are primarily just states S = {1, 2, ...N } in which the model can be in. From this point of view, the properties of the hidden states of an HMM are similar to those of a multilayer perceptron's hidden nodes. A hidden state does not correspond to an observation event directly, instead, each hidden state maintains an associated probability distribution for each possible observation symbol of the observation alphabet. These probability distributions are relevant for the second stage of the HMM process, and are also called observation likelihoods or emission probabilities. The set of emission probabilities of an HMM can be grouped together in matrix B: B = {bjk } bjk = P (Ot = ok |St = j) Ot denes the observation symbols (or emissions) at time t. The observations belong to the discrete alphabet V of size K : V = {o1 , o2 , ...oK }
The example model of the weather Markov chain, which was dened in the previous 21
section, shall be considered again. This example model shall now be extended into a full blown hidden Markov model in order to further illustrate the additional elements that characterise a hidden Markov model. Therefore, to each of the three states (these are now the hidden states), the complete alphabet of possible observation symbols is assigned. The probability relationship between the state and an individual emission symbol is described by the probability distribution bjk . The observation alphabet is typically larger than the number of hidden states in the model. Therefore the existing alphabet rain, clouds, sun is extended to rain, clouds, sun, f og, snow. A visualisation of this example weather HMM can be seen in gure. Finally, the formal denition of a hidden Markov model is summarised, including the parts that are overlapping with Markov chain models: A hidden Markov model can be formally denoted as λ = (S, A, B, π, V ) but usually in literature, e.g. in the short form λ = (A, B, π) is used. The individual components are: S = {1, 2, ...N } the set of N hidden states A = {aij } the state transition probability matrix, each aij representing the probability that the system changes from state i to state j B = {bjk } the emission probability matrix, each bjk representing the probability that state j generates observation ok π = {πi } the vector of initial state probabilities, each πi representing the probability that the system starts in state i V = {o1 , o2 ...oK } the alphabet of observation symbols A given HMM λ can be used as a generator for an observation sequence O of length T: O = O1 , O2 , ...OT
3.14
Three Basic Problems for HMMs
In order for HMMs to be of any practical use, poses three fundamental problems that need to be solved. Problem 1: Evaluation
"Given the observation sequence O = O1 , O2 , ..OT , and a model λ = (A, B, π), how do we eciently compute P (O|λ), the probability of the observation sequence, given the model?" To be able to compute the probability that an observed sequence was produced by a given model, is extremely relevant for the gesture recognition application presented in this paper. This problem can also be seen as one of scoring: How well does a given observation 22
sequence match a given model? Therefore, the solution to this problem lies at the heart of the gesture recognition process. This problem corresponds to the question about the likelihood of a weather forecast to be generated by the simple 3-state Markov chain. As a consequence, an obvious solution to this problem is to extend the existing method for hidden Markov models. The major dierence here is that the state sequence an HMM runs through, in order to produce a given observation sequence, can not be known in advance. Because the state sequence can not be known, every hidden state (and its emission probabilities) needs to be considered at every time time t. For an HMM with N hidden states and an observation sequence of length T, there are N T possible hidden sequences. The resulting algorithm is exponential in the length T of the observation sequence. Therefore, this method is out of question when it comes to practical applications. Luckily, as it will be shown in section , there is a more ecient solution to this problem, namely the so-called forward algorithm. Problem 2: Decoding
"Given the observation sequence O = O1 , O2 , ...OT , and a model λ, how do we choose a corresponding state sequence Q = q1 , q2 , ..qT which is optimal in some meaningful sense (i.e. best 'explains' the observations)?" This problem is not of immediate relevance for the presented gesture recognition task and was therefore not further examined. For the sake of completion it is mentioned here that a common solution to this problem is provided by the Viterbi algorithm. Problem 3: Training
"How do we adjust the model parameters λ = (A, B, π) to maximize P (O|λ)?" This problem refers to the training of a model. The model's parameters are to be optimised in order to increase its probability for the generation of a given observation sequence. Such an observation sequence is called training sequence. The ability to train HMMs is crucial for the gesture recognition application. The adaptation of the models parameter's to an observed training sequence, allows the creation of a model that best describes a certain gesture. Several training algorithms for HMMs have been developed. The most widely used method is Baum-Welch algorithm.
23
3.15 Forward Algorithm The forward algorithm provides a solution to calculate the likelihood that a particular observation sequence O was generated by a given hidden Markov model. Formally this can be written as P (O|λ) In contrast to the 'brute fore' method, that has been mentioned under problem 1, the forward algorithm exploits the Markov property (the probability of a particular state dependent only on the previous state) in order to reduce the number of computations required signicantly. First, an auxiliary set of variables the so-called forward variables αt (i) is introduced. It denes, for a given model λ, the probability that the rst part of an observation sequence up to Ot is generated and that at time t the system is in state i. αt (i) = P (O1 , O2 , ...Ot , st = i|λ)
1. Initialisation
The algorithm is initialised by obtaining the value for α at time t = 1. α1 (i) = πi bi (O1 )
1≤i≤N
2. Recursion
for all times t, t = 1, ...T − 1: αt+1 (j) =
N X
[αt (i)aij ]bj (Ot+1 )
1≤i≤N
i=1
3. Termination
Finally, all N probabilities α at time T are summed up. P (O|λ) =
N X
αT (i)
i=1
3.16 Backward Algorithm The backward algorithm by itself does not solve a new problem. Like the forward algorithm, it also calculates the probability P (O|λ). But the backward algorithm starts at time t = T , the end of the observation sequence, working backwards-while the forward algorithm. The backward algorithm is presented at this point, because it is used within 24
the Baum-Welch training algorithm. Analogous to the forward variable αt (i), a the backward variable βt (i) is dened: βt (i) = P (Ot+1 , Ot+1 , ...Ot , st = i|λ) 1. Initialisation
βT (i) = 1 2. Recursion
for all times t, t = T − 1, ...1: βt (j) =
N X
1≤j≤N
αij bj (Ot+1 βt+1 (j)
i=1 3.Termination
P (O|λ) =
N X
πi bi (O1 )β1 (i)
i=1
3.17 Baum-Welch Algorithm The Baum-Welch algorithm, sometimes also called forward-backward algorithm, is an iterative training method for hidden Markov Models. Training here means maximising the likelihood that a model λ emits observation sequence O. The Baum-Welch algorithm represents a variant of the expectation maximisation (EM) algorithm, which in general optimises parameters of multi-stage stochastic processes. At each iteration, the algorithm produces new estimates for the model parameters, the new model and parameters are denoted as λ0 = (A0 , B 0 , π 0 ). In order to describe the algorithm, the following auxiliary variables are dened. The rst variable γt (i) represents the probability of the system being in state i at time t. It can be calculated as follows, by drawing on the forward and backward variables dened in the previous sections: γt (i) = P (St = i|O.λ) =
αt (i)βt (i) P (O|λ)
The second variable denes the probability of a transition from state i to state j at 25
time t, it is denoted as γt (i, j). This probability can be calculated, based on the forward and backward variables, as follows: γt (i, j) = P (St = i, St+1 = j|O, λ) =
αt (i)aij bj (Ot+1 )βt+1 (j) P (O|λ)
By using the above formulas and the concept of counting event occurrences, the reestimates of the parameters of an HMM can be written as: a0ij =
expected number of transitions f orm state i to state j expected number of transitions f rom state i ΣT −1 γt (i, j) = Pt=1 T −1 t=1 γt (i)
b0j (ok ) =
expected number of times in state j and observing symbol ok expected number of times in state i P
=
γt (j) t=1 γt (j)
t:Ot =ok
PT
The start probabilities πi can be seen as a special case of the transition probabilities aij . The following equation determines the improved start probabilities: πi0 = γ1 (i)
Now that it has been described how an improved set of model parameters can be calculated, an overview of the overall algorithm is given: 1. Initialisation
Choice of suitable initial model λ = (A, B, π) with initial estimates for • πi start probabilities • aij transition probabilities • bjk emission probabilities 2. Optimisation
Compute updated estimates λ0 = (A0 , B 0 , π 0 ) according to the re-estimation formulas presented above. 26
3. Termination
If the quality measure P (O|λ0 ) was considerably improved by the updated model λ0 with respect to λ. let λ ←− λ0 and continue with step 2 otherwise stop. The Baum-Welch algorithm was at rst implemented according the formulas presented above. During the implementation phase of this project, some extensions and amendments were made. The algorithm was extended to work with multiple observation sequences. Thereafter, all probability calculations were moved to the negative-logarithmic scale, because of problems with numerical under ow.
3.18 Classication with Bayes Bayes rule provides a solution for computing the so-called posterior event B, P (B|A), based on prior knowledge about the random events P (A) and P (B) and the posterior probability of event A, P (A|B). It can therefore virtually reverse the conditional dependence of the events A and B. Bayes' rule in its general form is given by: P (B|A) =
P (A|B)P (B) P (A)
Given is the gesture pool of N hidden markov models λ1 , λ2 ....λN ,each model λj representing a gesture available for the recognition process. Now, the question is, which model λj is most likely to generate the sample observation sequence O. This question can be more formally posed as: P (λj |O)
One problem with hidden Markov models is that P (O|, λj ), the emission probability of observations O given the models λj , yields probability values that are varying in scale (compare the values for P (O|λ)) , depending on the actual gesture model. Therefore, these probabilities can not be directly compared with each other to nd out, which of the N HMMs actually generated the observation O most likely. By taking into a models average probability of its training sequences and denoting this as the models probability: P (λj ), the posterior probability P (λj |O) can be computed by applying Bayes' rule P (O|λj )P (λj ) P (O|λj )P (λj ) P (λj |O) = PN = PN i=1 P (O, λi ) i=1 P (O|λi )P (λj )
27
The model λj with the biggest value for P (λj |O) represents the classied gesture. A nice side eect of the Bayes' rule is, that the computed values for P (λj |O) add up to 1.
28
Chapter 4 Implementation 4.1 Test gestures The theory chapter provides a general description of the relevant methods and software algorithms, which are applied in this project. Some of these methods require the denition of application specic constants or initial values. For example the codebook size and initial prototype vectors for the vector quantiser need to be chosen. For some of these parameters, an optimal value can only be determined empirically and heuristically. In order to run experiments, which yield comparable results, four simple hand gestures are dened. Based on the results for these test gestures, optimal values for the according constants and model parameters are sought. The four test gestures are: Circle, bowling, Triangle and Z.
4.2 Vector Quantiser The Lloyd's algorithm is the vector quantiser training algorithm, utilised in this project. The purpose of this algorithm is to nd an optimised codebook for a given set of input vectors. The vector quantiser training algorithm is only of relevance during the training phase of a gesture. In order to train the vector quantiser with the Lloyd's algorithm, the vector data of all training sequences for one gesture is considered. This way, an optimised codebook for each individual gesture is created.
Figure 4.1: Test Gestures 29
Once the gesture is learned, and available in the gesture pool, the Lloyd's training algorithm is no longer involved, because the codebook is not re-estimated for the recognition of a gesture. When in recognition mode, the vector quantiser serves in its basic mapping function. It maps every incoming data vector onto one prototype vector of the gesture's codebook, according to the nearest-neighbour rule. Because each gesture has its own codebook, each input data sample goes through the vector quantisation process for as many times as there are trained gestures in the recognition pool. The training algorithm has the following two prerequisites: Firstly, the number of prototype vectors, the so-called codebook size N , needs to be dened. Secondly, an initial set of N prototype vectors, the so-called initial codebook Y 0 needs to be chosen. In these sections various experiments are conducted in order to seek suitable values for both. In the rst series of experiments several methods for choosing the initial prototype vectors are tried. Thereafter, various codebook sizes are investigated. A codebook size of 20 serves as starting point for the rst series of experiments. Recognition results are based on left-to-right hidden Markov models with 8 states and step size 2. Each test gesture is trained based on the sample data from 20 training sequences. Training as well as recognition trials are conducted by one and the same person.
4.3 Hidden Markov Model Although various software libraries for hidden Markov models do already exist, the necessary HMM algorithms were implemented from scratch for this project. The important reason is that the standard HMM training algorithm can only take one training sequence to train the HMM. For a robust classication of gestures, hidden Markov models needs to be trained with multiple training examples as described. Although the changes in the HMM training algorithm for multiple training sequences are straight forward, when trying to modify an existing software implementation, these changes go very deep. For these reasons, a fresh implementation is preferred. However, the code base of existing implementations has been consulted as a reference.
4.4 Training with Multiple Training Sequences According to the theory of hidden Markov models, an HMM can only be trained with one training sequence. However, the recognition rate of a real-world gesture can be greatly improved by using several training sequences. By repeating a gesture in various dierent ways, ideally also by dierent people, it is possible to set out a certain range that describes the actual gesture. The Baum-Welch algorithm can be modied to operate on multiple training sequences. 30
Figure 4.2: Left to right HMM with 5 states and step size 2 The set of Q training sequences is dened as O = {O(1) , O(2) ...O(q) }, where the q th observation sequence is: (q)
(q)
(q)
O(q) = {O1 , O2 ....OT q }
The modied equations to calculate the improved model parameters are: PTq −1 1 q=1 P (Oq |λ) t=1 PQ 1 q=1 P (Oq |λ)
PQ
a ˜ij =
q q (j) )βt+1 αtq (i)aij bj (Ot+1
PTq −1 t=1
αtq (i)βtq (i)
PQ
˜bj (ok ) =
P q q 1 t:Otq =ok αt (i)βt (j) q=1 P (Oq |λ) PTq PQ q q 1 t=1 αt (i)βt (i) q=1 P (Oq |λ)
The start probabilities πi do not need to be re-estimated. Since a left-to-right HMM is used, they remain xed with the rst state having the maximum probability π1 = 1
4.5 HMM and Digital Computing Systems At rst sight, the calculation of probability values for hidden Markov models with real world computing systems, seems to be a trivial problem. Therefore, the initial implementation of the HMM algorithms just followed the mathematical formulas. Although this initial HMM implementation basically worked, even in some practical tests with real gesture data, problems quickly arose, especially with longer data sequences and multiple training sequences. The probability values involved in the computation lie all in the interval [0 ... 1]. Problems occur especially in longer computational procedures, when extremely small values lying close to zero, need to be represented. Such vanishingly small probability values can appear relatively quickly in computations for Markov models. For example, the simplied 31
case of a Markov chain model shall be considered. The probability that a sample sequence O is generated by a given model λ can be computed as: P (O|λ) =
T Y
aSt−1 ,St
St = Ot
t=1
The above formula represents the multiplication of all state transition probabilities involved in the generation of observation O. When the length of the sequence T becomes increasingly larger, very small numerical values, which can hardly be represented in today's digital computers occur. The smallest number (closest to zero), that can be stored in a double precision oating point variable is 2.2 × 10−308 , if the value becomes even smaller, a so-called under ow occurs, and the variable's content is set to zero. However, when working with full blown hidden Markov models and real world data, such small values occur surprisingly often and they are of signicant relevance for the calculation of the formulas. Taking them as exactly zero leads to fatal errors. For example, the calculation of the forward variables αi , is basically a summation over a large number of very small values. If these values are taken as equal to zero, the result of the summation would still be zero. In contrast, when the actual very small values were considered, they summed up to a considerable value, which would usually again be within the machine's precision range. To overcome this problem, the rst attempt was simply to change the oating point format from double precision to quadruple precision, a 16-byte (128-bit) oating point number. Indeed, this change solved the problem for the considered sample data that had suered before from the under ow problem. One method for handling extremely small probability values consists in scaling these appropriately. The idea behind this solution is to multiply the probability values with a sucient large scaling factor, then to perform the critical calculations and nally to restore the original scale. Although this method should theoretically work ne, the actual implementation provided diculties. The scaling factor needs to estimated dynamically for each observation at time t individually and then cancelled out again for computations on the same time t. Also, the method can only be applied locally, which makes it cumbersome with longer computations. For the reasons mentioned above, another solution has been looked into and has been nally implemented. A method called logarithmic probability representation is applied. This method does not use real probability values in any computations. Instead, all probability values are represented on a negative logarithmic scale. To convert a probability value p from the normal linear scale to the negative logarithmic scale, the following transformation is applied: p0 = −loge p
By this transformation, the original range of values [0... 1] for probabilities is mapped 32
to the entire non-negative oating point number that can be represented with contemporary oating point formats. Although the resolution of this negative-logarithmic representation is conned by the same oating-point range, the accuracy is sucient for the practical application of the method. The negative logarithmic scaled probability value p' can be converted back to into the linear domain: p = e−p
0
Being in the logarithmic domain, the operations for the HMM formulas need to be converted as well. All multiplications in the formulas can now be written as simple additions, divisions become subtractions. However, doing additions when using the negativelogarithmic representation, becomes more complex. The values need to be converted rst to the linear domain, then added, and nally transformed to the logarithmic domain again. The following formula illustrates the logarithmic addition of value p01 and p02 : 0
0
p01 + (log)p01 = −ln(e−p1 + e−p2 )
This is quite a costly operation, especially for mobile phone processors. By using Kingsbury-Rayner formula 0
0
p01 + (log)p01 = p01 − ln(1 + e(−p2 −p1 ) )
There is still one exponentiation and one logarithm, which needs to be computed. Now, the problem of representing and manipulating numerically vanishing quantities with digital computers can be solved by using the negative logarithmic scale. However, there is one small problem left. What if `real' zeros appear, e.g. to denote non-existing connections of a left-to-right HMM. The negative natural logarithm of zero is positive innity, which is again not dened in the numerical range of a digital computer. The keyword for the solution to this problem is ooring. Flooring means dening a minimal value pmin and making sure that the probability values, such as state transition or emission probabilities, never fall below this minimal value.
4.6 Training of an HMM Prior to the training of a hidden Markov model, a fresh HMM needs to be instantiated. According to the literature,the various HMM parameters, such as the state transition matrix A and the emission probability matrix B, should be initialised as neutral as possible. The recommended method for choosing neutral initial values for A and B, is to use even probability distributions, which obey the stochastic constraints of the respective model conguration. A random initialisation in contrast, could bias the model in a negative 33
way and prolong the training phase. Following the recommendations, a freshly initialised state transition probability matrix of a 8-state HMM with step size 2 looks like this:
A=
0.3333 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
0.3333 0.3333 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
0.3333 0.3333 0.3333 0.0000 0.0000 0.0000 0.0000 0.0000
0.0000 0.3333 0.3333 0.3333 0.0000 0.0000 0.0000 0.0000
0.0000 0.0000 0.3333 0.3333 0.3333 0.0000 0.0000 0.0000
0.0000 0.0000 0.0000 0.3333 0.3333 0.3333 0.0000 0.0000
0.0000 0.0000 0.0000 0.0000 0.3333 0.3333 0.5000 0.0000
0.0000 0.0000 0.0000 0.0000 0.0000 0.3333 0.5000 1.0000
The emission probability matrix of an HMM with N = 8 states and observation alphabet size of K = 20 symbols is initialised as: B = {bjk } bjk =
1 1 = = 0.05 K 20
1 ≤ j ≤ N, 1 ≤ k ≤ K
The start probabilities for a left-to-right HMM are given as π0 = 1 and remain x during the training phase. When a new HMM for a gesture has been instantiated, it is trained with the BaumWelch algorithm, which optimises the model for the given training sequences. The BaumWelch training algorithm, and also its extended version for multiple training sequences, generally converges quite well towards a maximum likelihood for P (O|λ). According to the theory, the training algorithm produces an improved HMM (λ), after each training iteration: P (O|λ0 ) ≥ P (O|λ)
It has been observed that the relative increase of the likelihood can be very small during the rst couple of training iterations. Therefore, a lower bound value for the likelihood-increase can not easily be used as a termination condition for the algorithm. Otherwise, there would be a risk that the training stops, before the major increase could have taken eect. Based on the obtained data samples and their training convergence, it has been decided to simply stop after a xed number of training iterations. The likelihood P (Oj , λ) of the new model λ should always be greater, or at least stay the same, compared with the previous model λ. However, the practical test with the current training data shows that the likelihood sometimes slightly decreases. 34
Chapter 5 Design 5.1 Block Diagram 5.1.1 Accelerometer We want to convert the hand movement to voltage levels so that it can be processed using a microcontroller. For that we use an accelerometer which acts as a tranducer which converts the mechanical motion into corresponding voltage levels . Here the accelerometer used is MMA7361
5.1.2 Microcontroller The analog signal from the accelerometer should be quantized and should be converted to digital signals. The data so obtained should be sampled at various instants and we should that data to the computer for processing. For this an arduino uno board is used. It contain an atmega 328 ic. The inbuilt ADC present in it is used for analog to digital conversion. The sampled data obtained is ed on to the computer with the help of serial communication
5.1.3 Computer Interface The data sent by the microcontroller should be received by the computer and it should the values to the software used for processing that data
5.1.4 Gesture Recognition Algorithm The sampled set of values from the accelerometer which varies according to the gesture of the is processed by a gesture recognition algorithm. This gesture recognition algorithm identies the correct gesture. We implemented this part using matlab software using discrete hidden markov models. 35
Figure 5.1: Block Diagram
5.1.5 Display The correct gesture identied using the gesture recognition algorithm is displayed on the computer screen.
5.2 Accelerometer Selection The accelerometer used is MMA7361. The MMA7361L is a low power, low prole capacitive micro machined accelerometer featuring signal conditioning, a 1-pole low lter, temperature compensation, self test, 0g-Detect which detects linear freefall, and g-Select which allows for the selection between 2 sensitivities. Zero-g oset and sensitivity are factory set and require no external devices. The MMA7361L includes a Sleep Mode that makes it ideal for handheld battery powered electronics.
5.2.1 Features • 3mm × 5mm × 1.0mm LGA-14 Package • Low Current Consumption: 400 µA • Sleep Mode: 3 µA • Low Voltage Operation: 2.2 V - 3.6 V • High Sensitivity (800 mV/g @ 1.5g) • Selectable Sensitivity (±1.5g, ±6g) • Fast Turn on Time (0.5 ms Enable Response Time)
36
Figure 5.2: MMA7361 • Self Test for Freefall Detect Diagnosis • 0g-Detect for Freefall Protection • Signal Conditioning with Low Filter • Robust Design, High Shocks Survivability • Environmentally Preferred Product • Low Cost
5.2.2 Principle of Operation The device consists of a surface micro machined capacitive sensing cell (g-cell) and a signal conditioning ASIC contained in a single package. The sensing element is sealed hermetically at the wafer level using a bulk micro machined cap wafer. The g-cell is a mechanical structure formed from semiconductor materials (polysilicon) using semiconductor processes (masking and etching). It can be modeled as a set of beams attached to a movable central mass that move between xed beams. The movable beams can be deected from their rest position by subjecting the system to acceleration. As the beams attached to the central mass move, the distance from them to the xed beams on one side will increase by the same amount that the distance to the xed beams on the other side decreases. The change in distance is a measure of acceleration. The g-cell beams form two back-to-back capacitors As the center beam moves with acceleration, the distance between the beams changes and each capacitor's value will change, (C = Ae ). Where A is the area of the beam, e is the dielectric constant, and D is D the distance between the beams. The ASIC uses switched capacitor techniques to measure the g-cell capacitors and extract the acceleration data from the dierence between the two capacitors. The ASIC 37
Figure 5.3: Simplied transducer physical model of a capacitive accelerometer also signal conditions and lters (switched capacitor) the signal, providing a high level output voltage that is ratio metric and proportional to acceleration.
5.3 Interfacing Accelerometer With Microcontroller The AREF voltage, which act as the reference voltage for ADC pins in the microcontroller was set as 3V. The accelerometer was interfaced to the microcontroller. The analog voltage from the accelerometer was converted to digital form in the range of 0 to 1023. This digital voltage was divided into 4 sections of 256 each. The X pin of accelerometer was connected to PA0. When the x-axis voltage changes according to the tilt of the accelerometer, the corresponding LED connected to PORTD was lit, ie, if the output voltage of x pin is in the range 0 to 0.75 the LED connected to PD0 was lit. Similarly for the voltages in each range corresponding output pin was made high(PD0,PD1,PD2,PD3). In a similar way the voltages in y and z directions were checked separately. Then all the three axis were connected and programmed and the output was veried. The following table shows the corresponding voltages and the pins where the LEDs were glown.
Accelerometer voltage ADC value 0-0.74 0-255 0.75-1.49 256-511 1.50-2.24 512-767 2.25-3 768-1023
38
Pin PD0 PD1 PD2 PD3
Figure 5.4: LED working on Voltage Source
Figure 5.5: LED working on Accelerometer
39
Figure 5.6: Static Acceleration
Figure 5.7: Dynamic Acceleration
40
Figure 5.8: Arduino UNO R3
5.4 Interfacing With The Computer Various values were obtained from the 3 axis of the accelerometer. This received data needs to be serially transmitted to the pc. We tried to use a female DB9 connector to send the data from the microcontroller to the pc. A MAX232 IC was used for level conversion. But the data from the accelerometer was not received at the pc successfully. We also tried to communicate with RS232 which was unsuccesful. Thus we opted for arduino.
5.4.1 Serial Communication Using Arduino The Arduino Uno is a microcontroller board based on the ATmega328. It has 14 digital input/output pins (of which 6 can be used as PWM outputs), 6 anaputs, a 16 MHz ceramic resonator, a USB connection, a power jack, an ICSP header, and a reset button. It contains everything needed to the microcontroller; simply connect it to a computer with a USB cable or power it with a AC-to-DC adapter or battery to get started. It features the Atmega16U2 (Atmega8U2 up to version R2) programmed as a USB-to-serial converter. Revision 2 of the Uno board has a resistor pulling the 8U2 HWB line to ground, making it easier to put into DFU mode. Revision 3 of the board has the following new features: 1. 0 pinout: Added SDA and SCL pins that are near to the AREF pin and two other new pins placed near to the RESET pin, the IOREF that allow the shields to adapt to the voltage provided from the board. In future, shields will be compatible both with the board that use the AVR, which operate with 5V and with the Arduino Due that operate with 3.3V. The second one is a not connected pin, that is reserved for future purposes. 2. Stronger RESET circuit. 3. Atmega 16U2 replace the 8U2. 41
Features of arduino uno: 1. An easy USB interface . The chip on the board plugs straight into your USB port and s on your computer as a virtual serial port. This allows you to interface with it as through it were a serial device. The benet of this setup is that serial communication is an extremely easy (and time-tested) protocol, and USB makes connecting it to modern computers really convenient. 2. Very convenient power management and built-in voltage regulation. You can connect an external power source of up to 12v and it will regulate it to both 5v and 3.3v. It also can be powered directly o of a USB port without any external power. 3. A 16mhz clock. This makes it not the speediest microcontroller around, but fast enough for most applications. 4. 32 KB of ash memory for storing your code. 5. An on-board LED attached to digital pin 13 for fast an easy debugging of code. 6. A button to reset the program on the chip.
5.4.2 Reading the data The data from the accelerometer was read using arduino uno board. A microcontroller, Atmega328 has been programmed for serial communication, thereby making it easy to use and is reliable. It is also easy to program. Hence we opted arduino. Following is the program for serial communication in arduino.
42
5.4.2.1
Program
/* ADXL3xx Reads an Analog Devices ADXL3xx accelerometer and communicates the acceleration to the computer. The circuit: analog 0: vcc analog 1: ground analog 2: z-axis analog 3: y-axis analog 4: z-axis */ // these constants describe the pins. They won't change: const int groundpin = A1; // anaput pin 1 ground const int powerpin = A0; // anaput pin 0 voltage const int xpin = A4; // x-axis of the accelerometer const int ypin = A3; // y-axis const int zpin = A2; // z-axis (only on 3-axis models) void setup() { // initialize the serial communications: Serial.begin(9600); // Provide ground and power by using the anaputs as normal // digital pins. This makes it possible to directly connect the // breakout board to the Arduino. If you use the normal 5V and // GND pins on the Arduino, you can remove these lines. pinMode(groundpin, OUTPUT); pinMode(powerpin, OUTPUT); digitalWrite(groundpin, LOW); digitalWrite(powerpin, HIGH); analogReference(EXTERNAL); } 43
void loop() { // print the sensor values: Serial.print(analogRead(xpin)); // print a tab between values: Serial.print(""); Serial.print(analogRead(ypin)); // print a tab between values: Serial.print(""); Serial.print(analogRead(zpin)); Serial.println(); // delay before next reading: delay(100); }
44
45
The serially received data lies in the range from 0 to 1024. The corresponding values from the x,y and z axis are obtained continously in real time. That data was stored using software called coolterm.
5.4.3 Serial Communication With Matlab Using matlab the data was read into a port. The values were in ASCII or char array. It was converted into integer value. A number of samples were taken for each gesture inorder to train the HMM. After training the HMM we programmed HMM recogniser. With the help of these programs the given gesture was classied onto one of the predened gestures.
46
47
Chapter 6 Results The experiment was done by moving the accelerometer through various paths. These data were read in from the accelerometer by the microcontroller and it was transmitted serially using the arduino to the pc. The simulations results keeping the accelerometer at various positions were obtained. The graphs showing the accelerometer output values are as shown below. The gestures implemented were traingle, z, bowling and circle. Samples of these gestures were taken and the model was trained using these samples. The real time gesture was mapped onto the nearest and accurate predened sample.the following are the results obtained.
Figure 6.1: Accelerometer Kept Stationary 48
Figure 6.2: Acceleration at a slower rate
Figure 6.3: Acceleration at a faster rate
Figure 6.4: Z-Gesture 49
Figure 6.5: Circle-Gesture
Figure 6.6: Bowling-Gesture
Figure 6.7: Triangle-Gesture 50
Chapter 7 Issues faced We tried to read the data from the accelerometer using a DB9 connector and MAX232 IC connected to the PC. But that was unsuccessful. Hence we decided to serially communicate with the PC using ARDUINO UNO R3 board. The calculation of probability values for hidden Markov models with real world computing systems, seems to be a trivial problem. The probability values involved in the computation lie all in the interval [0, 1]. Problems occured especially in longer computational procedures, when extremely small values lying close to zero, needs to be represented. Taking the product of this probability made it negligible. Hence the probabilities were represented in negative logarithmic scale.
51
References [1] Marco Klingmann," Accelerometer-Based Gesture Recognition with the iPhone", Goldsmiths University of London, London, 2009 [2] Jiayang Liu, Zhen Wang, and Lin Zhong, "uWave: Accelerometer-based Personalized Gesture Recognition", Technical Report TR0630-08, Rice University and Motorola Labs, June 2008 [3] A Revealing Introduction to Hidden Markov Models Mark Stamp Associate Professor Department of Computer Science San Jose State University September 28, 2012 [4] Ahmad Akl,"A Novel Accelerometer-based Gesture Recognition System" [5] www.arduino.cc [6] Hand Gesture Recognition: A Literature Review by Raqul Zaman Khan and 2Noor Adnan Ibraheem [7] Goldsmiths University of London Accelerometer-Based Gesture Recognition with the iPhone by Marco Klingmann [8] Hand Gesture Recognition System by Emrah Gingir Using Distributed Accelerometers for Gesture Recognition and Visualization Pedro Emanuel dos Santos Vale Sousa Trindade
52
Appendix
53
Datasheets
54