Streaming Media

Streaming Media on Facebook Streaming Media on Twitter Streaming Media on LinkedIn

Adobe Talks Text-Based Editing in Premiere Pro

The potentially game-changing update to Premiere Pro announced at NAB 2023 became available in late May: text-based editing.

With the rolling release schedule of Adobe's Creative Cloud apps, updates happen all the time, and new and improved features arrive frequently and often unexpectedly. Catching up with Adobe at key events in the trade show calendar provides an opportunity to discuss more than the usual bullet list of feature mods large and small and get an overarching sense of the direction of key products in the cloud suite like Adobe Premiere Pro. Streaming Media's Marc Franklin had the opportunity to do just that at NAB 2023, meeting with Premiere Pro Senior Product Manager Francis Crossman to get the inside scoop on what's happening with the ubiquitous pro NLE.

"We've been really hard at work over the last year at Adobe," Crossman began, "and we've been really doubling down on the reliability and the speed of the product. And we're happy to say that this release is the most reliable and the fastest version of Premiere we've ever shipped. We've achieved this by, of course, fixing bugs and GPU-accelerating stuff, but also by giving you little things that are going to help you get your work done faster and little things to help you troubleshoot."

Text-Based Editing

The potentially game-changing update to Premiere Pro Crossman described as just over the horizon at NAB 2023 became available in late May: text-based editing, building on the text-to-speech engine added in 2021.

"That's exciting," replied Franklin. "I saw that in a press preview, and some of my clients that do a lot of lectures and conferences want to highlight things, and they tell me, 'We need to find where he said our product is the greatest in the world.' You can now just search 'our product is the greatest in the world,' and it'll just take you to that spot, right?"

"That's right," confirmed Crossman. "It's an entirely new way to edit. You can bring in clips and we're going to automatically transcribe them. We support 17 languages, and that transcription can happen on-device. It does not need an internet connection. It happens quickly in the background. So if you bring in a lot of clips into your project, we'll cue them up and it doesn't interrupt your editing. But once you have that transcript, you can use that text to navigate search for words like you were explaining, get that sound bite, and then edit it into your sequence."

What's more, he continues, "we use common keyboard shortcuts that you probably already know. The comma key for doing an insert edit works the same in text-based editing. And then once you have those sound bites in your sequence shift over to the sequence, the transcript will continue to inspect the sequence. And you can do things like cut, or copy and paste. You just cut the text, move the playhead somewhere else in the transcript, hit Paste, and it rearranges the clips on the timeline. And you've performed that edit using the text entirely."

Locating Pauses and Umms and Trimming them with Multicam Edits

"So, a lot of times I'll be working with either a new actor or somebody who's not used to being on camera," Franklin said, "and you get a lot of ums. Can you search for ums?"

"Yes," Crossman affirmed. "We can detect pauses and filler words, and you can just click on those. They're represented by three little dots--just delete 'em and it'll do a ripple delete. Here's what it's going to do. It's going to be represented as three dots. And what three dots means is, it's either a pause where there's silence, or you said something that's not quite a word. It's a verbal disfluent. These sorts of things happen all the time in interviews and it always helps to find those, trim them out, and compress time. And then of course, you're going to need to fix up that edit. So you can do that, of course, traditionally put some B-roll on top to cover it up. But if you're shooting multicam with multiple more than one camera, text-based editing supports multi-camera editing as well. So you could load up your multicam in there and use the transcript, and then once you have those cuts switch to a different camera angle."

"That's pretty exciting," Franklin said. "Doing a lot of lectures, educational stuff, I have clients that will go gaga over this."

"I was an editor for 10 years before coming to Adobe," Crossman replied. "I did a lot of dialogue editing, interviews and behind-the-scenes documentary-type stuff. A lot of the work that were done is that we do is dialogue-driven. And I can't tell you how many hours I spent with that workflow. You would get the footage in and then you'd have to get it transcribed. Either you'd sit down and transcribe it yourself, or you'd send it off to a service. And there are some fantastic services out there, but the turnaround time takes a while and that's also an additional cost. So having this built right into the product is something that, gosh, I wish I had that when I was editing professionally because it would've saved me hours."

"Yeah, I'm just thinking how well that will work for news, or the legal field," Franklin said. "You have legal videographers looking for something in a court case that they were recorded. Now that's going to be pretty amazing--where did the guy say he put the knife under the seat? And boom, there it is."

"Right," said Crossman. "Or like, how many times does this happen to you? You're working on a piece and, and you hope for a sound bite. You wish that they said this. You're pretty sure they said this, but you're not quite sure. So then you shuttle through the footage, you don't hear it, you shuttle through it again. You don't hear it. Yes. But you think that maybe you're just missing it. By searching the transcript, you can know immediately. Yes, they said it. No, they didn't. And you can move on."

"I think I had one case where the actor said the other actor's real name instead of the character's name. And we had to look for that, chop that out," Franklin said, and insert another instance of the right name.

"That is actually a really good example of what this workflow is great for," Crossman confirmed. "You can imagine if you're chopping stuff up and doing a Franken-bite, as we call it. You might want to use a phrase, but they end with a vocal inflection that doesn't sound like the end of a thought. Like they go up at the end. And so you can find other examples of them saying that word in the transcript, and maybe one of them has a better inflection and you can swap that in."

"I have a client I know I'm going to send right to that," Franklin said. "She's also a Creative Cloud user. She does a very rough cut of things, and she sends it over to me, and this is going to make her a very happy person."

"We think that it's going to be really beneficial for seasoned professionals to accelerate their workflow," Crossman said, "but also potentially like producers and editors who are not editors by trade, but they want to go in and look at the transcript. And you could imagine sitting a producer down and saying, 'Okay, here's the transcript. Highlight the text you want, and when you find something you want, hit comma, and it'll insert it into the sequence.' And then they could actually do that traditional paper cut in Premiere, saving everyone time."

Related Articles
What makes HP's new Z8 Fury Workstation so (fast and) furious? Among other things, by leveraging Intel's new Sapphire Rapids CPU architecture. Streaming Media's Marc Franklin gets the scoop in this interview with HP's Barbara Marshall from the HP booth at NAB 2023.
Streaming Learning Center's Jan Ozer explores the dazzling new Speech-to-Text feature in Adobe Premiere Pro, a powerful new tool that makes it remarkably quick and painless to create accurate transcriptions and burned-in or exported captions for your videos.
I perform three basic types of activities on my workstations: editing, encoding, and file analysis. With the Z840 in-house, I benchmarked performance in all three activities, comparing the results to my aging workhorse, the Z800. Part 2 presents the analysis results.
I perform three basic types of activities on my workstations: editing, encoding, and file analysis. With the Z840 in-house, I benchmarked performance in all three activities, comparing the results to my aging workhorse, the Z800. Part 2 presents the encoding results.
I perform three basic types of activities on my workstations: editing, encoding, and file analysis. With the Z840 in-house, I benchmarked performance in all three activities, comparing the results to my aging workhorse, the Z800. This 3-part article will present the results, starting with the editing tests.