Natural Language Processing Transcription for Journalists

Yann Gourvennec03/01/2020Last Updated: 28/06/2023

294 9 minutes read

Podcast (english): Play in new window | Download (Duration: 16:02 — 11.0MB)

I have been a Natural Language Processing (NLP) fiend for a long time. I havreshapede indeed used speech-to-text to write books and blog posts for close to 20 years. I discovered it once while visiting my GP and realising that he was using Dragon NaturallySpeaking to dictate to this computer. So, I thought, “if he can do it, why can’t I?” Since then, things have improved dramatically. Trint, which I discovered in 2018 upon a friend’s recommendation, has been a game-changer in the NLP arena. I interviewed its CEO and founder, Jeff Kofman to find out about the genesis of the system and to pick his brains on the future of NLP.

Natural Language Processing: How Trint reshaped content production for ever

I started using Natural Language Processing at the beginning of 2002. This is to say that this kind of technology is not new to me, but when I discovered London-based Trint two years ago, I was impressed with its capabilities. This was a step-change in the NLP marketMarket definition in B2B and B2C - The very notion of "market" is at the heart of any marketing approach. A market can be defined..., no doubt at all, this tool was changing the way we were producing content entirely and forever.

The difficulty with Natural Language Processing, at least in the early days, was the ability to recognise any kind of voice and even possibly two different types of voices (or more) in the same recording. It doesn’t sound much of an issue, but as Jeff Kofman remarks in the interview below, the difficulty does not lie with the engine which was technically crafted a long time ago so long ago that it doesn’t even seem real.

The real challenge is to make this kind of tools work in a real-life environment.

The real issue with content production is that you often start with a recording. This is because it is a lot less strenuous and also much more pertinent and reliable.
Thus, even though natural language processing has existed for a long time, there had never been a proper tool so far, which was able to detect different voices in the same recording, as well as transcribe any kind of voice, and any sort of accent without any training.
Nuance, which has just abandoned support for Dragon for Mac, therefore leaving a massive gap in this market, still requires proper training and is only able to detect one voice at a time. Trint, on the contrary, is extraordinary.
I use it all the time, therefore, focusing fully on quality of my content other than merely trying to remember what the person said or even worse, spending hours on end trying to transcribe what they have said.
I have wanted to interview Jeff Kofman for a long time, and the conference took a long time to organise. I’m thrilled to be able to present it here today on visionarymarketing.com.

Jeff Kofman: “I call myself an accidental entrepreneur.”

I call myself an accidental entrepreneur. If you had told me ten years ago that I would be running a company with 50 employees with revenue in the millions and that I would be in business at all, I would have laughed.
I spent more than 30 years as a broadcast journalist in Canadian and American television, the last half of my career as a foreign correspondent and a war correspondent for ABC News. And before that, CBS News, the big American networks reporting for Latin America and then the Iraq war and all sorts of other conflicts, then moving to London and covering the Arab Spring where I won my second Emmy.
There’s this concept called Product Market Fit. And I guess I found a person-job fit or something like that because I love what I did. It was incredibly hard. It was extremely demanding for more than 20 years of my career. I was on call 24/7.
And it wears you. It’s exhausting. And I was travelling 100 flights a year at the peak of my travels. And it was very exotic, very challenging and sometimes incredibly moving and often very creative.

Why do I have to transcribe my interviews in the 21st century?

I couldn’t have asked for a better career, but I knew that at some point I was going to hit my expiry date on American TV. So, you know, you try and jump before you get pushed. And as I was exploring opportunities as I was teaching at university while I was still at ABC News as the London correspondent, I was looking at writing a book. And I met some developers who’d done some work with text and audio and software.
And I told them “Why do I have to transcribe my interviews, speeches, news conferences in the 21st century like one had done since the 1960s or 70s when tape recorders were first commercially released?” And that led to a conversation about what exactly would the solution have to look like. And next, it led to some collaboration and experimentation that led to an incredibly exciting result.

Trint’s #subtitle and #caption extension for Adobe Premiere Pro means you never have to type captions again! See how it works: https://t.co/KwXVofJpYm #video #subtitles #premiere

— Trint (@TrintHQ) December 9, 2019

We thought we could invent the future.

With these people whom I didn’t really know, we just had this relationship over Skype; we thought we could invent the future. I left ABC News on November 30th, 2014. And twelve hours later we began working on this.
I flew to Florence, Italy, where I met the three developers at an Airbnb. And we began scoping out what this thing would be. And I had no business experience. I’ve never managed people, and I had never looked at a spreadsheet and had no understanding of how software is developed. But I just had this growing sense that the world really needed a solution that we could produce, and we could develop.

An NLP product could resolve the problems of content producers

The product was launched commercially in September 2016 to prosumers. We call them so because it’s a professional product, not a consumer product. But we launched it to individuals and it immediately we had the wind in our sails.
It means we were evidently solving a problem that people were hungry to see addressed. If you’re a journalist like me, an academic researcher, if you work in any kind of content production, the most significant pain point is finding the moments that matter. That means listening, playing, stopping, typing out the words, playing and stopping again, typing out more words etc.
Anybody who has done it knows that routine. And if you ask people what they think when you describe that, they go “Ah!, Pain, drudgery, it’s the thing I hate most!” and when I said, if we could automate the worst part of that process, what would you say? And they’d say “that would be magical. Oh, my God, I’d send you flowers for this!”.

See the amount of love we get at @TrintHQ

If you look at our Twitter feed at @TrintHQ, you’ll see the amount of love we get. In terms of usage. We released Trent Enterprise in April 2018, and we’ve focused on building Trint out now for teams and collaboration, for live transcription. We’ve added a mobile app as users have told us how they wanted to solve their specific challenges. We have had more than three hundred thousand people using the platform since it was launched.
We closed our Series A funding round in April 2019, and with US$ 4.5 million, we have 51 employees, 41 here in London and 10 in Toronto, where we have a North American sales office. And both offices are going to be growing significantly in 2020.

Three segments of users: individuals, small teams and enterprise users

We don’t release our daily and weekly figures, but our usage continues to grow significantly, and our users are in the many, many thousands. We now have three segments of users. We have individuals; we have what we call teams, which is two to 10 people. They are small production units, academic groups, marketing companies, anybody who needs qualitative research or needs to find the content in spoken word, whether it’s audio or video. And then we have Trint enterprise, which is 11 plus, and we are now really starting to scale it and getting substantial contracts with media organisations, governments, marketing firms, universities and beyond.

We have moved from a text-based to a voice-based economy.

We have moved from a text-based economy in the 20th century, to a voice-based economy in the 21st century. And so the need for Trint is genuine on a daily basis, pretty much for anyone, because if you think about the 20th century, yes, we have radio and TV, but text was still the dominant form of communication, whether it was written reports or newspapers, printed on paper or postcards written to our family when we travelled or before that, our grandparents and great grandparents sending telegrams. It was all text by then.
All of that is now done on WhatsApp, Uberconference or Zoom. It’s done in live recordings on our iPhones or Androids. More than 84 per cent of the traffic on the Internet is in video form. If you look at the rise of podcasting, you can see that people are driven to listen to The New York Times, not just to read the text on the screen.

The Natural Language Processing component has rapidly become commoditised

The natural language processing component has rapidly become commoditised. The history of natural language processing goes back probably to the 60s or 70s. And the actual fundamentals of the algorithms are probably 40 or 50 years old. What’s changed in the last 20 years is both the speed of computing and the storage capacity.
And so that’s why you’ve seen this massive advance in artificial intelligence in general and natural language processing, processing in particular. The reason you get such good results in English is just that the corpus of learning data is now so vast. And so we don’t need to or want to learn from you in particular.
Talking about that, we are keen on having the highest levels of data security certification available. We are certified through the International Standards Organization certification called ISO 27001Data security Management. That was a two-and-a-half-year project.
Everybody’s data is siloed from everybody else’s. And the important thing is we don’t look at your data. We don’t need to because particularly in English, it is now at such a sophisticated, monetised level that the reality is there are fewer than ten major natural language processing algorithms available. And most of them, the top ones are relatively close in their accuracy because this is now becoming a reliable science.

The real Natural Language Processing challenge is how to use it to solve people’s problems.

The challenge is how one can use that output in a way that solves people’s problems. Because what I as a reporter turned entrepreneur have discovered is that it’s one thing to produce something cool and that makes people go, wow. It’s another thing to turn that into an actual product that solves people’s daily routines. And that’s where we have become very specialised because it’s not just AI. That drives that trend; it’s applied AI. It’s the user interface on top of it that is so critical. And that’s where we have become experts.
We are now available in 28 languages, all the major European languages plus the major Asian languages. What we are witnessing now, is the ability to automatically train an algorithm to do NLP in tongues where the economic model wouldn’t have made it viable five years ago, let alone ten.

An API to integrate NLP in your proprietary applications

The API is used for upload for Trint enterprise. If you integrate Trint into an enterprise product, what you want to do is make it as easy for people to get into you into your add-on as possible. So, the API means, for example, that the Associated Press, uses Trint within their content management platform called ENPS (for Electronic news production system). It’s an Associated Press own platform that has its commercial division. It’s used by something like eight hundred organisations around the world, they have built in their internal version of ENPS in their video window where at the dropdown to export, it just says send a Trint, and that’s using our API. And what that means is that people don’t have to go into Chrome and upload their file. They can do it straight from there.

A vision for the future of Natural Language Processing

I think that the potential is just starting to be realised. I believe that when I look at where our product can be, not only five years from now, but two years from now, I think that the layers of metadata and analysis we can do on speech are starting to surface.
And in ways, that’s the challenge just to figure out how to apply them. But things like sentiment analysis and voice recognition, speaker recognition, I think, is what you’re going to see a lot more automation come into the content as it’s either captured or uploaded.
What we thought of until recently as a recording, which is just audio and video, will in the near future and then in the intermediate future most probably start to have vast layers of metadata automatically attached to it. And I think that it will utterly change the way we deal with recorded data.
It goes without saying that this interview was transcribed using Trint, with only limited editing. We are heavy users of Trint and are proud to be included in user feedback sessions whenever possible. With Trint, we are catching a small glimpse of the future. It is part of an array of innovative products and services that we, as content marketers, can use to produce better and more engaging content for the benefit of our readers and customers.

Author
Recent Posts

Follow me

Yann Gourvennec

CEO & Founder at Visionary Marketing

Yann Gourvennec created visionarymarketing.com in 1996. He is a speaker and author of 6 books. In 2014 he went from intrapreneur to entrepreneur, when he created his digital marketing agency.
————————————————————
Yann Gourvennec a créé visionarymarketing.com en 1996. Il est conférencier et auteur de 6 livres. En 2014, il est passé d'intrapreneur à entrepreneur en créant son agence de marketing numérique.