Heidi asks, “Which is the best and cheapest way to transcribe an audio file into text? And which tool do you use?”
The answer to this question depends on your level of skill. For most people in most circumstances, I recommend Otter.ai. That’s the best blend of cost and performance. If you really, really care about performance and you have the technical skill, I recommend building a custom model trained on your voice specifically with IBM Watson or an open-source deep learning network. That’s for a very select group of people, though – most people will be just fine with Otter.ai.
Can’t see anything? Watch it on YouTube here.
Listen to the audio here:
- Got a question for You Ask, I’ll Answer? Submit it here!
- Subscribe to my weekly newsletter for more useful marketing tips.
- Find older episodes of You Ask, I Answer on my YouTube channel.
- Need help with your company’s data and analytics? Let me know!
- Join my free Slack group for marketers interested in analytics!
What follows is an AI-generated transcript. The transcript may contain errors and is not a substitute for watching the video.
In today’s episode, Heidi asks, What is the best and cheapest way to transcribe audio files into text? And which tool do I use? So good question, the answer to this question is going to depend very heavily on your level of skill and your budget.
Right? For most people, in most circumstances, including myself, I recommend and use otter.ai le pull up on my phone here.
otter.ai is very, very straightforward.
Either load an audio file into it or you, you record live.
So in fact, I’m going to turn on the live recording now.
And what it’s doing is it’s listening to me.
And as it listens to me, it is starting to transcribe in in real time or near real time, which is very, very handy.
I use this a ton on my end, for conference calls, because I will say something like, Oh, yeah, we can get you that report on Tuesday.
And I’m like later on what what did I say.
So because the way you can split audio on a desktop computer, you can record only your end of the conversation.
For if you need to record the entire conversation.
Remember that you have to be in compliance with your localities or or countries wiretapping laws, because it is technically a form of wiretapping also allows you to take live audio feeds, there’s enough good audio in the room, like if you’re at a conference, and you’re sitting like right in front of the speaker, you can have it running and transcribing and actually, Simon allow one of the head folks there show me you can actually highlight and share the snippet along with the audio back to social media, which is pretty cool.
pricing on this is terrific pricing is about 10 a month, for I believe it’s 100 hours of recorded audio, which unless you are literally having somebody follow you around all day long, and just record everything that should be more than enough for almost everybody who is attending meetings, going to conferences, and even doing daily videos and podcasts.
So that’s the best solution that I think for most people in most circumstances.
Now, there are exceptions, if you really, really, really care about performance, and you have the technical skill to do so you may want to build a custom model that is trained on your voice specifically.
And you would do something that with something like IBM Watson speech to text you can do with Google Cloud voice, you can do it with Amazon AWS recognition, all these services allow you to build custom train models on your voice, they are actually less expensive.
In some cases, then auto but they require you to have access sense of knowledge of Python, because that is the language that you write the code in to interface with these services.
So that’s a case where not many people will have the technical skill, but could take advantage of the customization.
A really good use case for that would be if you have very complex custom lexicons, and you have a speaking voice that is in some way slightly different or more unusual, in which an off the shelf applications not going to perform as well.
So if you have, for example, a strong accent, you would probably want to do a custom model.
Even if you’re speaking English, you’d still want to a custom model, or especially if you speak languages other than like main, mainstream languages, like most of these apps, are released in the American and the UK market, and are tuned on the English language.
If you are speaking like Pashtun, or Swahili, or salsa, most of these apps are not going to work for you.
And you would need to go the custom modeling route.
To work with that, if you don’t fly out, don’t have the money at all, you could build a an open source deep learning neural network on your laptop and use some of the freely available code out there to build that network.
And that’s, that’s an even greater leap beyond when it comes to the technical skill that you need.
So most people most circumstances otter.ai, I think it is the best app out there for people, depending on on the market for some folks who have very special needs custom pre trained voice models with IBM or Amazon or Google.
And then for those folks who have the highest level of technical skill or the zero, like complete zero budget, but you have a laptop that for some strange reason has a GPU that you can use, you would do the open source deep learning neural network, I would say for the most part, stick with the vendors.
Because one of the things that’s happening right now in the in the natural language recognition space, is that the technology is evolving quickly.
And again, if this is not your core competency as a business, there’s no reason for you to be building your own and then trying to keep up with the software, I don’t even do that.
And AI and machine learning is the core of of our business.
But for the most part, everything off the shelf is really really is good enough.
I also like the fact that otter allows you to share transcripts.
So you can it’s it’s performs very similarly to like Google Docs and Google Drive.
If you’ve done a meeting with somebody, like I did this recently in a in an interview, I said to the interviewer, let’s capture the audio, and then I’ll send you an automated transcript and I could send them the link.
And it has that sort of dictation style playback, where you can see the little words going across the screen and then replay certain sections.
And that was super helpful, especially because the interviewer English is not their first language.
So I have a tendency to speak quickly.
And we were talking about some fairly complex stuff.
So they were able to get the transcript to reference for the article.
So that’s my recommendations for voice transcription services, as of autumn 2019, the landscape is always changing.
And there are always services to keep an eye out for and try.
The best way to compare services on the pricing basis is price per recorded minutes to figure out or ask as you deal with vendors, what is your price per recorded minute, because some folks will say like, yeah, you know, for10 a month you get this, but then it’s an extra, you know, four cents a minute.
And then when you work out your all the math and how many minutes you’re allowed.
I did this recently, somebody was pitching this thing, I go for podcasts, we got the best transcription service, you know, highest accuracy rates and stuff.
It’s only, you know, X dollars a month.
And on the surface, it sounded good.
I read the Terms of Service, though, and I was like, that is literally 128 times more in terms of cost per recorded minute, then honor.
And I don’t think they’re transcriptions that much better.
It’s not hundred 28 times better.
And I’m certainly not in a situation where I would need that level of accuracy.
The other thing that really messes people up when they’re comparing transcription services is that if you’re used to a human transcription service, a human is going to edit out and maybe even rephrase and tweak your language machine will never do that.
If you don’t speak in the same way that you write, any automated service is going to be a disappointment to you and was going to require more editing because you need you’re expecting machine to edit for you and they’re not at that point yet.
Automated services will take exactly what you give them and spit out almost exactly what what they heard.
So if you don’t speak as the same way you write, you’re going to do a lot of editing.
And it’s not comparable to human.
I personally don’t again, for my use cases, I don’t find that the cost increase going from you know, something like a penny per recorded minute to $1 per recorded minute.
I don’t see the justification for that for what I do.
But other people who need to have the more I guess, be spoke transcripts, you may still need a human to do the editing, either as part of the transcription process or afterwards.
So keep that in mind as well.
As always, please subscribe to the YouTube channel on the newsletter, and I’ll talk to you soon take care what help solving your company’s data analytics and digital marketing problems.
This is trusted insights.ai and let us know how we can help you
You might also enjoy:
- Almost Timely News, December 24, 2023: Why Mistral's Mixture of Experts is Such a Big Deal
- Transforming People, Process, and Technology
- Almost Timely News, December 17, 2023: Improving the Performance of Generative AI Prompts
- Vision, Mission, Strategy, Tactics, and Execution
- Can Causation Exist Without Correlation? Yes!
Want to read more like this from Christopher Penn? Get updates here:
Get your copy of AI For Marketers