How filming your household chores could train the robot butlers of the future
Humanoid robots doing household chores sounds like something straight out of a sci-fi movie, right? But that reality is closer than most people think, and the path to getting there involves a pretty unexpected detail: you filming your own daily routine at home.
Yep, you read that right.
Washing the dishes, sweeping the floor, cooking lunch — these everyday moments that seem absolutely mundane are becoming incredibly valuable raw material for training the next generation of intelligent robots. And to get in on this emerging market, you don’t need much: a head strap, a smartphone, and a list of household tasks are enough to get started.
The reason is simple but powerful: the artificial intelligence behind modern robots needs to learn how humans move, interact with objects, and navigate real-world environments to evolve beyond factories and actually make it into our homes. And the data available on the internet, which was enough to create amazing chatbots like ChatGPT, simply doesn’t cut it for physical robotics. 🤖
With the rapid evolution of artificial intelligence, humanoid robots have become the newest frontier in the race to master advanced technology. Manufacturers are rolling out a wave of new models capable of walking, dancing, and even fighting with increasing agility. But the industry’s true goal — a general-purpose robot that can work in stores, offices, and homes — requires a massive amount of data to learn how to replace humans safely and effectively.
Below, you’ll learn how this ecosystem is taking shape, who’s behind it, and what still needs to happen before a robot finally shows up at your door, ready to help.
Why home videos are worth gold for robotics
There’s a fundamental difference between teaching an artificial intelligence to have a conversation and teaching a robot to function in the physical world. When we talk about language models like ChatGPT, the fuel for learning is text — billions of pages, articles, books, and digitized conversations that already existed on the internet. Trained on hundreds of billions of words scraped from the web, ChatGPT uses what it learned about textual patterns to generate the most likely responses to user questions.
The problem is that this type of data is completely useless when the goal is getting a robot to pick up a glass without knocking it over, fold a shirt, or open a drawer without destroying everything around it. For that, AI needs something entirely different: it needs to see how a human being does these things, in real time, inside a real environment.
After text, AI models evolved to produce images and videos on demand, leveraging content available on the internet. But robot developers need a much more specific set of training data and don’t have access to the same instant library that the web offered for other AI applications.
That’s where the concept of egocentric data comes in, also called human data, which consists of videos captured from the point of view of the person performing the task. Unlike a security camera mounted in the corner of a ceiling, an egocentric video shows exactly what a person’s eyes see while they’re washing dishes, making the bed, or peeling a potato. This type of perspective is infinitely more useful for training robots because it replicates the view the robot itself will have when performing the same task. The camera sits on the head of the person recording, capturing every movement, every grip adjustment, every glance at an object before touching it.
This need has created a voracious appetite for first-person footage, and over the past few months several startups have entered this market to meet the demand, collecting and annotating videos from thousands of contracted workers around the world.
Who’s building this future right now
The movement around collecting egocentric data for robotics is no longer some futuristic lab idea. One of the companies leading this race is Micro1, based in Palo Alto, California, which started recruiting its own army of remote videographers last year.
According to Arian Sadeghi, vice president of robotics data at Micro1, the demand for this type of content spans practically every sector imaginable.
Manufacturing, factories, warehouses, retail, nursing homes, hospitals — you’re going to need this type of data in basically every environment, because the movements are all different, Sadeghi explained.
Each person who participates in the program receives a headset to mount the camera, filming instructions, and a list of tasks like cooking, cleaning, gardening, and taking care of pets. Workers need to rotate between different activities and submit at least 10 hours of video per week.
Although the videos currently revolve around household tasks, Sadeghi said the company encourages contractors to experiment with what they film, in case it could eventually help robots adapt faster to new environments and responsibilities.
What we tell them is: if you think you’d like a robot to do this for you, go ahead and record it, Sadeghi said.
Billions of hours of video are still needed
Micro1 already has about 4,000 robotics generalists spread across homes in 71 countries, sending the company more than 160,000 hours of video per month. But according to Sadeghi, that’s far from enough.
You probably need billions of hours, he said. We haven’t even gotten to human interactions yet. This is just simple household tasks.
He said the growing demand for data in robotics mirrors the early trajectory of ChatGPT and other AI chatbots. And just as text was the fuel that powered the language model revolution, first-person videos will be the fuel that powers the physical robot revolution.
This data shortage has turned into a billion-dollar market opportunity for startups like Micro1, which also handle annotating the videos so that robots can differentiate objects, distances, and physical movements. Market research firms estimate that the data collection and labeling industry will grow at an average of about 30% per year, driven by growth in Asia, and is expected to reach at least 10 billion dollars by 2030.
Not every recorded video is useful for training
Ravi Rajalingam, founder of the data annotation company Objectways, was providing audio and video data to train AI virtual assistants and self-driving cars before shifting his focus to robotics last year. Since he started hiring people to collect human data, he’s found that only about half of the material submitted is actually usable.
Still, with 90% of his clients based in the United States and the assumption that American consumers will have the purchasing power to adopt humanoid robots first, some clients are willing to pay more for data collected in American homes — even though the hourly cost can be up to three times what a worker in Vietnam or India would earn.
The kitchen in India is very different from the kitchen in the U.S. A broom in India is very different from a broom in the U.S. So the variety is important, but it depends on where you’re going to deploy your robots first, Rajalingam explained. That’s why we’re collecting data all over the world.
The different training methods for robots
For decades, robots were trained primarily by humans using remote controls. But that requires expensive, dedicated hardware. More recently, a cheaper option emerged with the use of simulation software to create virtual scenarios, although this approach is generally less effective for interactions with physical objects, like picking up a cup.
With data, it’s always a trade-off between quality and quantity, said Alicia Veneziani, vice president of market expansion at Sharpa, a Singapore-based android startup specializing in robotic hands.
China, which is pouring state investment into high-tech industries, has announced plans for at least 60 robot training centers across the country. Most of the mass-produced humanoid robots in China so far have been acquired for training and research purposes, according to Marco Wang, a Shanghai-based analyst at Interact Analysis, a technology research firm.
But late last year, the industry began embracing the use of human data as a middle-ground solution, since the only costs involved are a recording device like a GoPro, Meta glasses, or smartphone, and hourly wages ranging from 5 to 20 dollars depending on the region.
The idea here is: I don’t want the robot performing the task. I want people performing the task, Wang explained. That way, you don’t have to pay for the robots, you just have to pay for the equipment and the people.
Different models in different parts of the world
Wang said he’s already seen business models in Japan and South Korea similar to the data collection centers in China, but with bases in Southeast Asia to take advantage of cheaper labor. Tesla has been training its Optimus humanoid robot at its own facilities in Fremont, California, and plans to expand to Austin, Texas. Wang noted that the United States and Europe tend to favor simulation-based training, an approach championed by Nvidia, which designs the most advanced computer chips in the world.
However, in a February report, Nvidia revealed that incorporating more than 20,000 hours of first-person videos into robot training improved the success rate by more than 50% on tasks like rolling up T-shirts, sorting playing cards, unscrewing bottle caps, and using syringes.
If you rely on just one form of data collection, it’s probably not the best approach, Wang said, adding that he expects companies to increasingly combine different strategies. In the future, it will be a mix of different approaches.
The last mile of automation
The tipping point for autonomous robots came three years ago, when the large language models behind ChatGPT gave rise to a new type of algorithm capable of translating visual signals into physical action, according to Puneet Jindal, co-founder of the data annotation company Labellerr AI. Robots that were previously programmed only for repetitive tasks began to perceive and navigate the world around them.
His company started collecting its own first-person videos this year, recorded by workers at manufacturing facilities in India. For the next three years, Jindal said prioritizing human data is a no-brainer. But this boom might not last forever. Soon, this content could improve simulation-based training, or if AI manages to convert YouTube videos found online into a first-person perspective, that could become a viable substitute.
Even the robotics labs feel like they don’t know what data they’re going to need 12 months from now, he said.
The challenge of domestic unpredictability
Part of the reason general-purpose robots need so much training is the extreme unpredictability of home environments. Furniture, appliances, and people are constantly moving, and no two homes are alike. According to Rutav Shah, a robotics researcher at the University of Texas at Austin, the biggest hurdle remains the lack of intuition.
What’s really missing is a human-like intuition about forces, friction, and uncertainty that people acquire over the course of their entire lives, Shah said. Making robots that are generally useful for everyday household tasks like cooking and cleaning — that’s going to be the last mile of automation.
So far, humanoid robots have been deployed mainly in controlled environments like factories, where they can complete their tasks 99.9% of the time, according to Alexander Verl, research president of the International Federation of Robotics. Even in something seemingly simple like folding T-shirts, the current success rate is still too low to be commercially viable.
The probability that it will work is generally around 70 or 80%. Coming from manufacturing, that’s really not something our industry partners want to use, Verl said.
Safety and the risks of having a robot at home
Rajalingam from Objectways also highlighted the safety risks that come with bringing robots into home environments. If a robot is cleaning a toy room but can’t tell the difference between a doll and a real baby, the results could be catastrophic.
If the robot picks up my baby and puts it in a trash can, that’s a million-dollar lawsuit right there, he said.
Testing robots around babies is still a long way off, according to Rajalingam. However, he added that testing has already begun with dogs. 🐕
Beyond the physical risks, there’s the issue of privacy and trust. Having a robot operating inside your home essentially means having a device with active cameras and sensors capturing everything that happens in the most intimate space people have. Companies in this space will need to provide clear and transparent answers about how that data will be used, stored, and protected — and that conversation with the public is just getting started.
The success of humanoid robots in homes will depend just as much on technological evolution as it will on building a genuine relationship of trust with end users. And that, perhaps, is the most complex challenge of all. 🏠🤖
In the meantime, with every video of someone sweeping the kitchen or folding laundry that gets uploaded to one of these platforms, robot AI gets a tiny bit smarter. The future of robot butlers is being built one home video at a time — and who would have thought that the key to advanced robotics would simply be recording someone doing the dishes.
