AI-powered ‘sonar’ on smartglasses tracks gaze and facial expressions 

Cornell researchers have developed two technologies that track a person’s gaze and facial expressions through sonar-like sensing.  

The technology is small enough to fit on commercial smartglasses or virtual reality (VR) or augmented reality (AR) headsets, yet consumes significantly less power than similar tools using cameras. 

Both use speakers and microphones mounted on an eyeglass frame to bounce inaudible soundwaves off the face and pick up reflected signals caused by face and eye movements. One device, GazeTrak, is the first eye-tracking system that relies on acoustic signals. The second, EyeEcho, is the first eyeglass-based system to continuously and accurately detect facial expressions and recreate them through an avatar in real time.

Ke Li is the lead researcher for GazeTrak smart-eyeglass tracking technology.

The devices can last for several hours on a smartglass battery and more than a day on a VR headset. 

“It’s small, it’s cheap and super low-powered, so you can wear it on smartglasses everyday – it won’t kill your battery,” said Cheng Zhang, assistant professor of information science in the Cornell Ann S. Bowers College of Computing and Information Science. Zhang directs the Smart Computer Interfaces for Future Interactions (SciFi) Lab that created the new devices. 

“In a VR environment, you want to recreate detailed facial expressions and gaze movements so that you can have better interactions with other users,” said Ke Li, a doctoral student in the field of information science who led the GazeTrak and EyeEcho development. 

Using sound signals instead of video also presents fewer privacy concerns, Li said. “There are many camera-based systems in this area of research or even on commercial products to track facial expressions or gaze movements, like Vision Pro or Oculus,” he said. “But not everyone wants cameras on wearables to capture you and your surroundings all the time.” 

Li will present “GazeTrak: Exploring Acoustic-based Eye Tracking on a Glass Frame,” at the Annual International Conference on Mobile Computing and Networking (MobiCom’24), Sept. 30 to Oct. 4. 

“The privacy concerns associated with systems that use video will become more and more important as VR/AR headsets become much smaller and, ultimately, similar to today’s smartglasses," said co-author François Guimbretière, professor of information science in Cornell Bowers CIS and the multicollege Department of Design Tech. “Because both technologies are so small and power-efficient, they will be a perfect match for lightweight, smart AR glasses.” 

For GazeTrak, researchers positioned one speaker and four microphones around the inside of each eye frame of a pair of glasses, to bounce and pick up soundwaves from the eyeball and the area around the eyes. The resulting sound signals are fed into a customized deep learning pipeline that uses artificial intelligence to continuously infer the direction of the person’s gaze. 

GazeTrak does not yet work as well as the leading eye-tracking technology, which relies on cameras, but the new device is proof of concept that audio signals are also effective. The researchers think they can reach that same accuracy and reduce the number of speakers and microphones required, given further optimization. 

For EyeEcho, one speaker and one microphone is located next to the glasses’ hinges, pointing down to catch skin movement as facial expressions change. The reflected signals are also interpreted using AI. 

With this technology, users can have hands-free video calls through an avatar, even in a noisy café or on the street. While some smartglasses have the ability to recognize faces or distinguish between a few specific expressions, currently, none track expressions continuously like EyeEcho. 

Li will present this work, “EyeEcho: Continuous and Low-power Facial Expression Tracking on Glasses,” at the Association of Computing Machinery (ACM) CHI conference on Human Factors in Computing Systems (CHI’24), held May 11-16. 

These two advances have applications beyond enhancing a person’s VR experience. GazeTrak could be used with screen readers to read out portions of text for people with low vision as they peruse a website. 

GazeTrak and EyeEcho could also potentially help diagnose or monitor neurodegenerative diseases, like Alzheimer’s and Parkinsons. With these conditions, patients often have abnormal eye movements and less expressive faces, and this type of technology could track the progression of the disease from the comfort of a patient’s home. 

Multiple Cornell researchers also contributed to this work, including Ruidong Zhang, Mose Sakashita and Saif Mahmud, all doctoral students in the field of information science; James Chen ’24, Shawn Chen ’24 and Kenny Liang ’24; and Sicheng Yin, a master’s student at the University of Edinburgh. 

This research is supported by the National Science Foundation and the IGNITE Innovation Acceleration Program. 

Patricia Waldron is a writer for the Cornell Ann S. Bowers College of Computing and Information Science. 

Media Contact

Becka Bowyer