The Radio Archive Search Service allows people to search through English language talk & news-focused radio stations using transcripts created through automatic speech recognition technology. Our goals are to:
Make radio an easily citable medium of record
Help journalists find what was said on the radio and include the evidence in their articles
Give Wikipedia editors primary sources for their research
Allow researchers to programmatically look at thousands of hours of radio to better understand the media landscape as a whole
Enable people everywhere to independently verify what they hear and read
This is an experimental service. Automatic speech recognition (ASR) technology is amazing - but it’s also not always correct or current. And the search can only be as accurate as the ASR.
For example, while we see plenty of mentions of “coronavirus” or “corona virus” in these broadcasts, there are currently zero mentions of “COVID.” But of course, that’s not right - the word “COVID” is spoken on the radio all the time. The ASR technology just doesn’t recognize it. So instead of “COVID 19” the searchable transcript will say, “covered 19,” or “covet 19,” or “coated 19,” or “coded 19,” etc...
In other cases, the ASR program knows a proper name, like Brewster Kahle, but it doesn’t recognize another proper name that kind of sounds similar, like Musa Qala (a city in Afghanistan). So a search for Brewster Kahle finds mentions of him, AND mentions of Musa Qala that have been misinterpreted.
We see problems with how spoken words are turned into transcripts when there is a lot of background noise, when people have heavier accents, when people speak over each other, etc. And of course, the lyrics in musical sections can present problems.
On the other hand, other common words and phrases are much easier to find, whether that’s “gun control,” “unemployment,” “renewable energy,” “space force,” “tiger king,” “Dolly Parton,” or “Cirque du Soleil.”
All of this is to say, don’t expect your searches in the Radio Archive Search Service to be as easy as using Google. The technology just isn’t there yet. But of course, even if the transcript you searched isn’t accurate, the audio itself is available for you to listen to for verification.
We are just beginning to dip our toes into the world of radio. The good news is that all of this radio is being preserved for the future. And as ASR improves over time, our ability to mine the radio of the present and past will also improve.
As we wend our way through the current health crisis and the 2020 U.S. political campaign, we decided that an imperfect service is better than no service at all. And as far as we know, this is the only keyword searchable archive of radio news from multiple stations that is available to the public.
We welcome your feedback. What worked for you? What didn’t? Do you have suggestions for how to improve the service? Email us and let us know.