EE586L/Projects 2010
Contents
Better than Helium
Authors: Randy Lee, Yaniv Tal, Michael Webb
Abstract: The purpose of our project was to perform a pitch (frequency) shift (frequency) and a time shift on an audio input. Only the pitch shift is noticed because the time shift is a corrective measure. When pitch shifting, a natural time shift occurs so that the duration of the output is related to the input by the shifting factor. The time shift corrects the natural one. The pitch shift itself is implemented by the phase vocoder method, which alters the phase of the FFT of an input frame to create a shift in pitch. The output frame after the corrective time shift is a pitch shifted version of the input with the same duration.
Poster:
Video:
Eye Catchers
Authors: Ashvin Deodhar, Trevor McGuire, Jason Tokayer
Abstract: Automatic video gaze tracking finds applications in handicapped accessibility and behavioral analysis. Many challenges exist in recognizing and tracking the pupil of the eye. We adapt a version of the Starburst algorithm, originally developed to track the eye with an infrared illumination source, to track the pupil in the visible spectrum. The adapted algorithm runs on a TI DSP and tracks the pupil with a camera mounted in front of a monitor. We demonstrate functionality by detection of menu options on the monitor using the gaze of the eyes.
Video: YouTube Video
Ewind
Authors:Sheng Guo, Yinjun Pan, Weiwei Wang
Abstract: Our project is Emotional Detection. We used pitch, energy, and MFB as features to estimate the emotion of the speaker. We assigned a weight to each feature based on their importance. EMA database was used for training data and test data. During the demo, we will use both this database (not include the training data) and some speeches from YouTube to test our system.
Poster:
Interactive Object Tracking
Authors: David Martin, Ming-Chun Huang, Yu-Jen Huang
Abstract: In this project, a camera-based object tracking system will be demonstrated along with virtual reality effects. Our goal is to replace a real world target with a pre-stored image. To make the replacement appear realistic, several image enhancements, such as color blending, image scaling, shadow insertion and additive noise, should be applied to the pre-stored image before superimposing it on the live video feed. In contrast with simple static image replacement, the object will be dynamically adjusted to mimic realistic shadows and rotations, and then displayed on the LCD screen. The expectation of the project is to provide both entertainment and education platform for kids and the elderly in an interactive and animated fashion, such as story telling and rehabilitation in video games.
Poster:
Video: YouTube Video
LAPD
Authors: Somanath Krishnaswamy, Tyler Miller, Russell Stradling, Kyle White
Abstract: The LAPD project is based off a need to help identify speakers in a room for remotely located listeners in a teleconference. From group experience, a teleconference pod tends to garner a significant amount of visual attention, yet very little if any information is gained by looking at the teleconference pod. The idea is to place a simple display on top of the pod to display speaker location in the room to remote participants. Speech over telephony is different than in person, and some speakers may not be known well to all participants, so additional information on speaker identity may be advantageous to participation and conversation cohesion. The source location solution provides a method to provide some speaker identity information while not introducing significant new demands on available processing power or cost. The initial system is built on top of voice activity detection and GCC-PHAT based speaker localization components.
Poster:
Modem
Authors: Mithun Baphana, Mark Lyubarev, Feisal Rasras, Suneesh Sasikumar
Abstract: We implemented a baseband acoustical modem utilizing adaptive equalization. A speaker is used as a transmitter and a mic as a receiver. The modem supports a range of data rates. A voice message can be recorded at the transmit side and then transmitted over the air to the receive side for playback. Receiver symbol timing synchronization is derived from the equalizer weight distribution.
Poster:
Video: YouTube Video
Mosaic
Authors: Pengkui Chu, Shiyu Xu, Ying Zheng
Abstract: In the virtual drums system we use a digital camera to capture the movement and position of user’s palms, and then use several features to match the palm’s movement with certain sound and output the sounds, in real time. The camera recognizes hand movements by using a motion detect algorithm and an edge detect algorithm. After these detecting steps, we can extract several key features of the movement (hit), then we use a statue machine to control the output of the sound (the sound type and sound frequency) to make the hits smoothly. In current version, the user can play 6 sounds and we have a single hit mode and two high frequency modes for each sound.
Video:
Project Natal - The Beginning
Authors: Talha Gorsi, Arjun Gupta, Vikash Khatri
Abstract: Project Natal is the combination of two classic arcade games, Snake and Vertical Scrolling Plane. These games are controlled using human gestures which are identified by converting the video sequence from camera to background subtracted binary frames and analyzing motion in the resulting frames. The classic snake game is an example of discrete human action recognition and the movement of snake is controlled by moving the hand to the right, left or up. The snake can eat eggs, increase in size and hit the maze. Vertical Scrolling plane is inspired from "1942 game" and the plane tries to follow the position of your hands in right-left or up-down directions doing the continuous tracking of hand position. The obstacles appear in random order and the target is to save plane from hitting any obstacle. The user can select the game interactively on the home page by pointing hand in right or left direction. This home page appears in the start and at the end of each game.
Poster:
Video:YouTube Video
Sixth Sensors
Authors: Chong Liu, Daru Xu, Yuanhang Su, Yashodhar Narvaneni
Abstract: The project is to design and develop a Interactive drawing and presentation tool which will realize the freedom of free hand moving and gestures. The system uses a CCD camera to capture the hand movements and gestures of the user. A finger tracking and gesture recognition algorithms are implemented on the DSP board and the output is fed to a Projector. The fingers are tracked based on the color caps worn on the finger and the gestures are detected based on the direction and speed of motion. The tool is used as a drawing tool and a interactive Presentation tool. This can be further extended to use as a good user interface tool for lectures, teaching, television, computer or many other devices with which a user can interact.
Poster:
Video:
SmartVoice
Authors: Yu Si, Zhiyang Wang, Yuyu Xu
Abstract: Auto-speech recognition. Using MFCC model and VQ Technology to extract the voice feature and create personal ID. Two modes provided, inside training mode, password is entered while inside testing mode, actual indentification and access control are implemented.
Poster:
Tommy Says
Authors: Mohamed Alkaabi, Jeannette Chang, Tina Chou
Abstract: "Tommy Says" is a vision-based memory game that challenges players to reproduce random sequences by positioning their hands in specified locations. The game uses a video camera and LCD interface and provides immediate audio feedback.
Poster:
Video: YouTube Video