SANTA CLARA, Calif. — In a lab on the fourth floor at Intel’s corporate headquarters, a small team of engineers tinkers with computers and cameras they believe will help usher in the future of computing.
One holds a prototype tablet up to a visitor’s face, captures the image in 3D and offers to print it as the head on a plastic superhero action figure.
Another steps in front of a PC and starts smiling and grimacing at it, showing how the computer interprets his emotions, shifting from joy to disgust. On the screen, a series of avatars mimic his expressions.
The technology the engineers are demonstrating is a camera system that can see like a person and yet is small enough and inexpensive enough to fit in a standard laptop or mobile device. Called RealSense, it’s the result of three-plus years of research and development, initially as a skunkworks project and later as a full-fledged product push involving hundreds of people.
For Achin Bhowmik, the Intel executive who helped spearhead the development effort, the project was rooted in a desire to make machines more understanding. That meant finding a way for computers to see in 3D.
“If you look at almost all things biological, like humans, monkeys, dogs, eagles, snakes, everything has two eyes and three-dimensional visual perception capability,” said Bhowmik. “They use these to understand and navigate the world around them and communicate with each other.”
Giving this ability to computers could enable a great leap forward in artificial intelligence, while creating a new way — through gestures — to control a computer. Bhowmik imagines many possibilities.
“You can think of humanoid robots that see like us and walk like us and understand the world like us,” he said. “You can build flying drones that can avoid running into humans and other obstacles. You can pretty much transform computing.”
Intel’s RealSense 3D camera technology is an early step in this direction. Bringing it to market required several engineering breakthroughs along the way.
Shrinking lenses and lasers
The RealSense 3D camera is a small module equipped with infrared lasers, multiple imaging devices, and special processing chips. These components work together to allow the camera to capture not only color pixels, but also the distance of the pixels on objects within the camera’s fields of view. That way it preserves the full 3D information of the scene.
The whole unit, including the connector that transmits the information, is about the size of a stick of gum.
How did it get that small?
At the beginning of 2012, Intel had a working proof of concept that was about the size of a toaster, much of the bulk coming from the laser projectors and processing boards. This device was impractical. But it made clear to Mooly Eden, the Intel senior vice president who oversaw the project, that there was a path to making it commercially viable.
Intel assembled a team of experts in Israel and Santa Clara who set about miniaturizing the components and, at the same time, figuring out how to lower the cost to build them. The team designed custom lasers, optical components, imaging devices, purpose-built processors, and put them all together in modules that could be integrated in the thin lid of a laptop or inside a tablet.
Meanwhile, work got underway to develop camera software precise and responsive enough to recognize fine-motor gestures. It’s comparatively easy to write a program that can identify a specific set of hand positions, like a fist. But such a system didn’t lend itself to the style of interaction Bhowmik and Eden envisioned.
In 2011, Intel invested in and started working with Omek Interactive, an Israeli startup that wrote software for 3D cameras that Intel later acquired. From the outset, everyone agreed that to get the level of gesture control RealSense required, the software needed to be able to detect the 3D positions of 22 joints in the human hand.
That’s a problem that hadn’t been solved before, according to Gershom Kutliroff, Omek’s cofounder.
The initial efforts focused on identifying computer-vision techniques that could help. For instance, one method identifies and tracks fingertips, but once someone folds his fingers into his palm it stops working.
Eventually, the team chose an approach that combined elements of several different techniques. The software makes a model of the hand it’s trying to track, then compares its model to what the camera sees, recalibrating if it isn’t an exact match.
It does this about 200 times every 0.02 seconds.
“We were able to move the joints of this model in a way that’s very similar to the way that the human hand can move,” Kutliroff said.
Among the benefits of this approach is that the software can guess what a hand is doing even if it can’t see some of the fingers.
A tablet for depth photography
Simultaneously, Intel’s mobile unit was trying to develop a tablet capable of taking photos with an extra dimension. The result is Intel’s RealSense snapshot technology.
The motivation was straightforward: “We just wanted to help people take a better picture,” said Peter Winer, the Intel executive who oversaw the effort.
A 3D camera would allow people to take a photo, then change the color of some parts of an image, measure the distance between two objects in it, and change a picture’s focal point, even after it was taken. Winer and his team believed these features would work well on social media, among other uses.
The first prototype required nine cameras and components, which cost around $50. That’s expensive, and it used too much processing power to be practical, so Winer and his team figured out how to simplify the design and bring down the cost.
That led them to the current three-camera model, which uses off-the-shelf components and can be manufactured inexpensively enough to not have a big impact on the overall cost of the tablets that use it.
By August 2013, the team was at work on the industrial design. Because the device didn’t yet exist, Winer and team rigged up a piece of plexiglass with attached GoPro cameras in order to write the software that combined images from the different cameras.
That rig had six GoPro cameras with a variety of layouts and distance between them. This allowed Winer and team to test how accurately different three-camera configurations captured depth, which helped them determine the final layout.
The finish line
Now that RealSense-equipped devices are about to hit shelves, there’s a widespread sense of accomplishment within Intel.
“We’ve taken something from science fiction and removed the fiction,” said Eden.
Back in the lab, Bhowmik gets reflective.
“We are at the beginning of a journey that will have a profound effect on how we use computing devices and how they enhance our daily lives,” he said.