On a daily basis, tens of millions of people around the world are depending on smart speakers and their voice recognition software to find music, interact with the internet of things and play games. When we think about Amazon’s Echo speakers capturing commands for Alexa, its digital assistant in the cloud, we often overlook the role that humans play in the process.

Belinda Henwood: Who is listening to our Alexa conversations? 

David Vaile: Amazon's Alexa voice recognition system generates sound files or recordings that, apparently, are routinely listened to by large numbers of people within Amazon, as well as contractors outside the company. 

These ‘listeners’ aren’t just in the US as you might expect, but are also based in countries including Costa Rica, India, Romania, and possibly others. So, thousands of people around the world – both those directly employed by Amazon and subcontractors – are listening to our conversations. 

BH: Why are they listening in to what the speakers hear? 

DV: Amazon offers a number of explanations, but the core reason seems to be to help improve its voice recognition system by correcting it where it gets lost. A major part of that process is based on the teams of Amazon ‘listeners’ transcribing and then annotating voice recordings; these are then fed back into the system. It’s a form of human-assisted error correction and machine learning development. 

David Vaile

David Vaile, Stream Lead for Data Protection and Surveillance, Allens Hub for Technology, Law & Innovation

BH: How transparent is this process of voice recording? 

DV: There seems to be minimal transparency in what Amazon’s contract binds you to when you click ‘OK’ to the terms of service. There’s the usual button at the end where you agree that you have read and understood the terms (even if you haven’t), and everybody clicks that button. But, in fact, almost no one reads the terms and conditions – in part because they are long and written in legalese, and in part because they seem so dull. 

Often there appears to be little point in reading these terms, because companies frequently leave out the most significant things – the specific details that might help you understand what it means in reality. For example, Amazon does mention that your recordings may be used for quality control and recognition-improvement purposes. But it doesn't say it has thousands of people around the world listening to you, transcribing, and passing your information between their teams when they need help, or maybe just for entertainment. 

Apparently, there is a control setting in the Amazon software interface that says, in effect, you can opt out of your recording being used for the development of new features. This may give the impression of a full opt out, but it doesn't mention that Amazon may still analyse your recording for other purposes. 

BH: How long are the recordings kept?

DV: Amazon is operating in a zone that is pretty hard to regulate or to investigate. So, we don't know – the recordings could be kept for quite some time or they could be deleted rather quickly. 

If you read the various terms of service, there is no indication of this or many other important aspects of how they store your data. You do apparently have the option to delete recordings (which presumably works), but most people will be reluctant to take on yet another ongoing task. Also, it is unclear if you have automated options like ‘delete everything after a week’. Amazon does warn that if you delete anything, your service quality may be adversely affected, which could discourage you from using this functionality.

BH: What happens to the information after it has been used to improve speech recognition?

DV: We can’t be sure what happens to the information. One of the problems is that Amazon makes a lot of fuss reassuring people that their privacy is really important and it takes security seriously – “Amazon knows that you care how information about you is used, and we appreciate your trust that we will do so carefully and sensibly.” But on a number of these issues, it appears to be much less forthcoming than even Facebook and Google.

Facebook and Google at least make relatively full disclosure of the nature of information they hold and critical aspects like the number and type of government access requests for information they receive. Amazon doesn’t break out Alexa requests in its very limited Transparency Report, and it was the last major company to adopt this practice.

Controversially, Amazon has also been quite happy to sell commercial access to another form of surveillance tool – its face recognition services –– to government, and has also done so for law enforcement and intelligence purposes. There's nothing to say this voice data might not end up going in the same direction.

In the US, there's a range of mechanisms through which the government can get access to the information held in the cloud there. And there are ‘National Security Letters’ and other restraints which can gag Amazon, preventing it from telling people why the government may require access. 

One thing that's worth us Australians noting – and anyone outside the US – is that for the purposes of US law we are all ‘non-US persons’, and many of the limited protections from excessive government access to all sorts commercial and proprietary and private information about US citizens don’t apply to us. Basically, we have no comeback if our information goes through to the US government via Alexa. 

Amazon HQ.jpg

Amazon employs people around the world in places such as Costa Rica, India and Romania to listen to audio files collected from interactions with Alexa. Image from Shutterstock

BH: So, should we be careful about what we say to our smart speakers? 

DV: Yes, absolutely. We need to be aware of and understand what's actually going on with the speakers’ listening function. After the most recent revelations, you may want to reconsider whether you really want a smart speaker in your home. 

Think about teams of thousands of people around the world listening in, without notice, and making transcripts and analysing the characteristics in your speech, such as your accent. It may not happen often, but it may end up being used against you, or perhaps those in your community. 

Or perhaps the company is just trying to work out how it can market to you more effectively. Could it get under your radar by personalising messages based on words from your transcript, and what it’s learnt about you from your voice or what you have been talking about?

Also, you may be exposed to unexpected use of this information, in particular by the US government – because Amazon is based there – but perhaps by others as well. 

Some people will happily make the tradeoff, but, in the absence of full and frank disclosure, and without any law that would enable us to protect our rights if something goes wrong – neither in Australia nor in the US – increasingly we should be thinking twice about having smart speakers in our homes. 

It's buyer beware. There are precious few options here if your data is abused, and you will probably never find out if it is.

Belinda Henwood