Speech-to-Text Conversion for Podcast
Today you don’t need to have a professional recording studio with cool equipment or a large number of followers to make a podcast. You just need an idea, a smartphone and a YouTube account (which by the way is free). Solutions for online content production are rather cheap: many online streaming platforms are available for free, there are lots of video and audio editors with impressive features. For instance YouTube allows you to upload videos or make online translations, upload a thumbnail, add subtitles and add calls-to-action.
We got a request from a podcast author to convert all episodes, which already were recorded, to text. There were 200 audio files, each about 1-2 hours long. And the customer required a cheap and fast decision to convert the audio files to text.
We decided to use Google Speech-to-Text API, which works rather fast and precisely. It processes each file for about a half of its length (e.g. a 1-hour long file would be processed for 30 minutes).
Google Speech-to-Text API has a special aspect: if an audio is longer than 1 hour, before processing it should be uploaded to Google Cloud Storage and then the uploaded file could be converted to text.
First we’ve set up a Storage. Then, by the script, each file was uploaded to the Storage, converted to text using Speech-to-Text API and the text file was saved to a local folder.
In order to keep it cheap we’ve set up a virtual machine to run this script on a customer's device. It ran successfully and saved all subtitles in a separate folder.