Saif Mahmud, a doctoral student in the field of information science, with PoseSonic glasses.

Glasses use sonar, AI to interpret upper body poses in 3D

Throughout history, sonar’s distinctive “ping” has been used to map oceans, spot enemy submarines and find sunken ships. Today, a variation of that technology – in miniature form, developed by Cornell researchers – is proving a game-changer in wearable body-sensing technology.

PoseSonic is the latest sonar-equipped wearable from Cornell’s Smart Computer Interfaces for Future Interactions (SciFi) lab. It consists of off-the-shelf eyeglasses outfitted with micro sonar that can track the wearer’s upper body movements in 3D through a combination of inaudible soundwaves and artificial intelligence (AI).

With further development, PoseSonic could enhance augmented reality and virtual reality, and track detailed physical and behavioral data for personal health, the researchers said.

“What’s exciting to me about PoseSonic is the potential for its use in detecting fine-grained human activities in the wild,” said Saif Mahmud, a doctoral student in the field of information science. “When we have lots of data through body-sensing technology like PoseSonic, it can help us be more mindful of ourselves and our behaviors.”

Mahmud is the lead author of “PoseSonic: 3D Upper Body Pose Estimation Through Egocentric Acoustic Sensing on Smartglasses,” which was presented Oct. 10 at the joint Pervasive and Ubiquitous Computing (Ubicomp) and International Symposium on Wearable Computing (ISWC) conference, in Cancun, Mexico.

“We’re the first research group using inaudible acoustics and AI to track body poses through a wearable device,” said senior author Cheng Zhang, assistant professor of information science in the Cornell Ann S. Bowers College of Computing and Information Science, and director of the SciFi Lab. “By integrating cutting-edge AI into low-power, low-cost and privacy-conscious acoustic sensing systems, we use less instrumentation on the body, which is more practical, and battery performance is significantly better for everyday use.”

PoseSonic has two pairs of tiny microphones and speakers – each about the diameter of a pencil – attached to the hinges of eyeglasses. The speakers emit inaudible soundwaves that bounce off the upper body and back up to the microphones, generating an echo profile image. This image is then fed into PoseSonic’s machine learning algorithm, which estimates the body pose with near-perfect accuracy. And unlike other data-driven, wearable pose-tracking systems, PoseSonic functions well without an initial training session with the user, the researchers said.

The system can estimate body movements made at nine body joints, including the shoulders, elbows, wrists, hips and nose, which is useful to estimate head positioning, researchers said.

The technology is a major step up from existing wearable devices that often require a mini video camera, which isn’t always practical. Current wearables with video cameras also require significant battery power and pose privacy concerns, the researchers said. Acoustic sensing requires minimal power – 10 times less than a wearable camera. Because of this, the technology makes for a much smaller and unobtrusive wearable, researchers said.

Further, they added, there’s much less privacy risk with sonar.

“A wearable video camera poses privacy risks for anyone in the wearer’s vicinity,” Mahmud said. “Our solution: Let’s put a little inaudible acoustic field around us that can track our body’s movement while also respecting other people’s privacy.”

PoseSonic is one of three sonar-equipped, wearable devices from the SciFiLab that members presented at Ubicomp/ISWC this fall:

  • EchoNose is a sensor that attaches to eyeglasses that emits inaudible acoustic signals into the nasal and oral cavities to read mouth, breathing and tongue gestures.
  • HPSpeech expands on the SciFi Lab’s study of silent speech by transforming off-the-shelf headphones into a silent speech reader – think of mouthing the words “volume up” to adjust music volume on a nearby smartphone. HPSpeech received an Honorable Mention award.

The lab’s fourth Ubicomp/ISWC device and paper, called C-Auth, is a user authentication method for smart glasses that uses a mini camera to read the wearer’s facial contours.

Along with Mahmud and Zhang, co-authors of PoseSonic are: Ke Li and Ruidong Zhang, doctoral students in the field of information science; Guilin Hu ’24; Hao Chen ‘24; Richard Jin ’24; and François Guimbretière, professor of information science in Cornell Bowers CIS.

This research is supported by the National Science Foundation.

Louis DiPietro is a writer for the Cornell Ann S. Bowers College of Computing and Information Science.

Media Contact

Becka Bowyer