Geekazine

Geekazine


Simon Says Assemble is Text-Based Video Editing

May 12, 2021

Normally, I’m trying to figure out the words to say. But I think I should let the transcription say it for me. This is Simon Says – it’s a transcription app that uses AI to not only transcribe, but also helps you figure out how to edit videos by looking at what was said.

Assemble puts the text next to timecodes. You can choose to remove a certain part of text/video, or create a quick social meme.

But I spoke too much. Let’s let the software do the work:

Jeffrey Powers: continuing coverage, see twenty, twenty one virtual virtual being me, you hear us?   I’ve got here us Shmeer. And the company called Simon says you’re called Schmeer, but you’re not called  Simon, right?

Shamir Allibhai: Exactly. Yeah. You know, I get that mixed up a lot incessantly on chat where people are like,”Hi, Simon.” No Schama, but  Simon’s fine.

Jeffrey Powers: Now tell people your role at Simon Says and a little bit about what Simon says, says or is.

Shamir Allibhai:  Yeah, absolutely it is.  Shamir Allibhai , CEO and founder of Simon says AI, where we’re using speech recognition to transcribe and translate audio video, typically at the beginning of post-production, where you have your dailies or rushing your interviews that you want to find the meaningful parts of, or at the end of an edit where you want to subtitle caption that edit for distribution.

Jeffrey Powers: And last month we released it, Simon says Assemble, which is now text based video editing so you can find the sound bytes in your transcript and then drag and drop in order then to create the foundation of your story. It’s collaborative, it’s web based, getting a lot of kudos for it. So I’m really excited with where it’s headed.

Jeffrey Powers: So how that how Assemble works is but we’ll take this video we’re making we’re making a video right here, I would put that into Simon says it would transcribe it and then I would see like a storyboard type situation. Is that how that works?

Shamir Allibhai: So, yeah, and so on. When you load up, assemble, you’ve already transcribed your video. And so what you see is in the source monitor. And below is the transcript for the original video that you imported. And then you just go in editing a Microsoft Word or Google Docs, just highlight the sentences

Shamir Allibhai: the key words and the beauty and the kind of technology behind it is what speech recognition is doing is it’s matching each word to a time reference point in the video.

Shamir Allibhai: And so when you highlight the sentence, Simon knows the end point and I’ll point for that sentence. And so you highlight that sentence and you highlighted and then on the timeline side, now you have these sub-clips these sentences, and you can just drag order sentences around to create the foundation of your story.

Shamir Allibhai: And a video is dynamically created again, sentences nose in and out points for each sentence. You give an example, one video, but imagine you’re done 10 interviews. And so you want to cut from interviewee A interviewee B, you can do that.

Shamir Allibhai: On the timeline side, the video is dynamically created. So now you’re watching in real time this this rough cut being formed and easy, as is some word,