Amazon Explains Why Alexa Recorded Private Conversation
In a promotional video, Amazon.com Inc. says its Cloud Cam home security camera provides “everything you need to monitor your home, day or night.” In fact, the artificially intelligent device requires help from a squad of invisible employees.
Dozens of Amazon workers based in India and Romania review select clips captured by Cloud Cam, according to five people who have worked on the program or have direct knowledge of it. Those video snippets are then used to train the AI algorithms to do a better job distinguishing between a real threat (a home invader) and a false alarm (the cat jumping on the sofa).
An Amazon team also transcribes and annotates commands recorded in customers’ homes by the company’s Alexa digital assistant, Bloomberg reported in April.
AI has made it possible to talk to your phone. It’s helping investors predict shifts in market sentiment. But the technology is far from infallible. Cloud Cam sends out alerts when it’s just paper rustling in a breeze. Apple Inc.’s Siri and Amazon’s Alexa still occasionally mishear commands. One day, engineers may overcome these shortfalls, but for now AI needs human assistance. Lots of it.
At one point, on a typical day, some Amazon auditors were each annotating about 150 video recordings, which were typically 20 to 30 seconds long, according to the people, who requested anonymity to talk about an internal program.
The clips sent for review come from employee testers, an Amazon spokeswoman said, as well as Cloud Cam owners who submit clips to troubleshoot such issues as inaccurate notifications and video quality. “We take privacy seriously and put Cloud Cam customers in control of their video clips,” she said, adding that unless the clips are submitted for troubleshooting purposes, “only customers can view their clips.”
Nowhere in the Cloud Cam user terms and conditions does Amazon explicitly tell customers that human beings are training the algorithms behind their motion detection software.
And despite Amazon’s insistence that all the clips are provided voluntarily, according to two of the people, the teams have picked up activity homeowners are unlikely to want shared, including rare instances of people having sex.
Clips containing inappropriate content are flagged as such, then discarded so they aren’t accidentally used to train the AI, the people said. Amazon's spokeswoman said such clips are scrapped to improve the experience of the company's human reviewers, but she didn't say why unsuitable activity would appear in voluntarily submitted video clips.
The workers said Amazon has imposed tight security on the Cloud Cam annotation operation. In India, dozens of reviewers work on a restricted floor, where employees aren’t allowed to use their mobile phones, according to two of the people. But that hasn’t stopped other employees from passing footage to non-team members, another person said.
The Cloud Cam debuted in 2017 and, along with the Alexa-powered line of Echo speakers, is one of several gadgets Amazon hopes will give it an edge in the emerging smart-home market.
The US$120 device detects and alerts people to activity going on in their homes and offers them free access to the footage for 24 hours. Users willing to pay about $7 to $20 for a monthly subscription can extend that access for as long as one month and receive tailored alerts—for a crying baby, say, or a smoke alarm. Amazon doesn’t reveal how many Cloud Cams it sells, but the device is just one of many home security cams on the market, from Google’s Nest to Amazon-owned Ring.
While AI algorithms are getting better at teaching themselves, Amazon—like many companies—deploys human trainers across its businesses; they help Alexa understand voice commands, teach the company’s automated Amazon Go convenience stores to distinguish one shopper from another and are even working on experimental voice software designed to detect human emotions.
Using humans to train the artificial intelligence inside consumer products is controversial among privacy advocates because of concerns its use can expose personal information. The revelation that an Amazon team listens to Alexa voice commands and subsequent disclosures about similar review programs at Google and Apple prompted attention from European and American regulators and lawmakers. The uproar even spurred some Echo owners to unplug their devices.
Amid the backlash, both Apple and Google paused their own human review programs. For its part, Amazon began letting Alexa users exclude their voice recordings from manual review and changed its privacy policies to include an explanation that humans may listen to their recordings.
Reports by the Information and the Intercept technology websites in the last year examined the human role in training the software behind security cameras built by Ring. The sites reported that employees used clips customers had shared through a Ring app to train computer vision algorithms, and, in some cases, shared unencrypted customer videos with each other.
Amazon doesn’t tell customers much about its troubleshooting process for Cloud Cam. In its terms and conditions, the company reserves the right to process images, audio and video captured by devices to improve its products and services.
In a Q&A about Cloud Cam on its website, Amazon says “only you or people you have shared your account information with can view your clips, unless you choose to submit a clip to us directly for troubleshooting. Customers can also choose to share clips via email or social media.”
The Cloud Cam teams in India and Romania don’t know how the company selects clips to be annotated, according to three of the people, but they said there were no obvious technical glitches in the footage that would require submitting it for troubleshooting purposes.
At an industry event this week, David Limp, who runs Amazon’s Alexa and hardware teams, acknowledged that the company could have been more forthcoming about using people to audit AI. “If I could go back in time, that would be the thing I would do better,” he said. “I would have been more transparent about why and when we are using human annotation.”