An SRT (SubRip Text) file is a plain text file that pairs numbered cues with timing and text. Each block has an index, a start and end timestamp written as HH:MM:SS,mmm, and one or two lines of subtitle text. Almost every video player, editor, and platform reads SRT without conversion.

How do I create an SRT file from a video?

Transcribe the video's audio, then export the transcript as SRT. In Hushscript you drop the video to get a free 30-second preview, sign up to transcribe the rest, then choose SRT on export. Each spoken line becomes a cue with its start and end time already filled in, so you never type a timestamp by hand.

What is the ideal line length and reading speed for subtitles?

Keep each line to about 32 to 42 characters and no more than two lines on screen at once. Aim for a reading speed at or below roughly 17 characters per second, which is about 160 to 180 words per minute. If a cue exceeds that, split it at a natural pause so viewers have time to read.

Does my video file get uploaded to a server?

No. Hushscript extracts the audio from the video in your browser before anything is sent, so the video itself stays on your device and only the smaller audio file reaches the server. The audio is deleted once the transcript is ready.

Can I include speaker names in the SRT?

Yes. Rename each speaker label in the transcript editor before you export, and those names appear at the start of the relevant cues. Note that some platforms strip speaker prefixes from SRT on import, so if the names must stay visible, consider burning the captions into the video instead.

What is the difference between SRT and VTT?

They carry the same content. WebVTT adds a WEBVTT header line and uses a period instead of a comma in timestamps, and it is the native format for the HTML5 video element. Use SRT for editors, desktop players, and most platforms; use VTT for web players. You can convert between them in seconds.

How large a video can I caption?

Up to 2 GB or 10 hours per file, with no daily caps beyond the minutes in your balance. Because the audio is extracted before upload, a large video sends far less data than its full file size.

How accurate are the timestamps?

Each cue is aligned to the speech engine's word timings, usually within a few hundred milliseconds of the true start and end. That is tight enough to read naturally. For fast back-and-forth dialogue where speakers overlap, you may want to nudge a couple of cues by hand.

How to Create SRT Subtitles from a Video

June 17, 2026

An SRT file is how subtitles travel separately from a video, as a small sidecar text file the player reads alongside the picture. You make one by transcribing the audio and exporting the result with timestamps. The work that used to take an afternoon, timing every line to the frame, is now done for you; what is left is a short edit pass for readability. This guide explains what the format actually contains, how to create an SRT from a video in the real order, and how to clean it up so the captions read well.

What you need before you start

You need the video file and a few minutes. Any common container works: MP4, MOV, MKV, WebM, AVI, and the rest. You do not need editing software to create the SRT, and you do not need to convert the video first. A subtitle editor is optional and only useful later, if you want a timeline view while you fine-tune timing.

For the transcription itself you sign up, which unlocks 30 free minutes to try. The 30-second preview that renders first needs no account, so you can see the quality before committing to anything.

What an SRT file is

SRT stands for SubRip Text. The format is deliberately simple: a sequence of numbered blocks, each made of three parts stacked on separate lines. First an index number, then a start --> end timestamp, then one or two lines of subtitle text. A blank line separates one block from the next.

1
00:00:02,340 --> 00:00:05,810
The audio is extracted in your browser.

2
00:00:06,100 --> 00:00:09,440
Only the audio reaches the server, not the video.

A few details matter when you read or hand-edit a file. The timestamp format is HH:MM:SS,mmm, hours through milliseconds, and the separator before the milliseconds is a comma, not a period. The arrow between the two times is exactly two hyphens and a greater-than sign. The index numbers should run in order from 1 with no gaps. Get any of those wrong and some players will silently drop the affected cue, which is the single most common reason a hand-edited SRT “stops working” partway through.

Support for the format is close to universal. Every major desktop player reads it, including VLC, QuickTime, and mpv. Every non-linear editor imports it, including Premiere Pro, DaVinci Resolve, and Final Cut. Almost every video platform accepts an SRT upload in its caption settings. The main exception is the HTML5 <video> element on the open web, which expects WebVTT instead; that is a quick conversion covered further down.

Create an SRT from a video in the real order

The flow has three stages, and the order is the part people get backwards. You drop the file first and preview it before any account exists.

Drop your video to get a free preview. Go to /video-to-text, or /mp4-to-text if your file is an MP4, and drop the video onto the upload area. The audio track is extracted in your browser using a local processing library, then the first 30 seconds come back as a speaker-labeled preview. No account is needed for this step, and the video itself never leaves your device.
Sign up to transcribe the rest. Once the preview looks right, create an account to run the full file. Signing up unlocks 30 free minutes to try. The quickest way to get them is to validate a card with a one-dollar hold that is authorized and then released right away, never charged; if you would rather use another payment method available in your country, the 30 minutes arrive with your first purchase instead. A card is not required either way. Transcription is paid because it runs on top-tier AI, so it is pay-as-you-go with no subscription. For a longer video that exceeds the free minutes, see the pricing page; you pay per minute of audio, not per file.
Get the transcript and export SRT. When the transcript is ready, open it in the editor, make any corrections, then click export and choose SRT. Each spoken utterance becomes one cue with its start and end timestamp already in place. The download is a plain .srt file with no watermark.

Speaker labels appear in the editor before you export. If you want names in the captions, useful for interviews, debates, or panels, rename “Speaker 1” and “Speaker 2” to the real names first; the export then writes those names at the start of the matching cues.

A worked example

Say you have a 12-minute recorded interview, founder-chat.mp4, with two people. You drop it on the page, the 30-second preview confirms both voices are picked up cleanly, and you sign up. The full transcript comes back in about a minute. In the editor you rename the two speakers to “Maya” and “Devin,” fix one misheard product name, then export as SRT. The opening of the file looks like this:

1
00:00:01,120 --> 00:00:04,460
Maya: Thanks for making the time today.

2
00:00:04,800 --> 00:00:08,210
Devin: Of course. I have been looking forward to this.

3
00:00:08,540 --> 00:00:12,900
Maya: Let us start at the beginning. How did the idea come about?

Two things are worth noticing. Each cue maps to one natural utterance, so the timing follows the conversation rather than a fixed clock. And the gaps between cues, here a few hundred milliseconds, reflect the real pauses between speakers. That is usually exactly what you want, with one caveat covered next.

Edit timing and line length

Most exports are usable as-is, but two quick checks make captions noticeably easier to read. This short pass is what separates auto-generated captions from ones that feel professional.

Line length and number of lines. Subtitles read best at roughly 32 to 42 characters per line, with a hard ceiling of two lines on screen at once. The SRT format itself allows any length, so nothing stops a single cue from running to three or four lines, but a wall of text is hard to read in the second or two it is visible. When an utterance runs long, split it into two consecutive cues at a natural pause rather than cramming it into one block.

Reading speed. Length on screen has to match how long the cue is shown. A good target is around 17 characters per second, which works out to roughly 160 to 180 words per minute. If a cue shows a lot of text for a very short time, the viewer cannot finish it. Either shorten the wording or extend the end timestamp slightly into the pause that follows.

Timing gaps and overlap. A gap shorter than about 80 milliseconds between two cues can make some players hold the previous line too long or skip the next one, so leave a small breath between back-to-back cues. The opposite problem is overlap: when two people talk over each other, the engine assigns each speaker a separate cue, and those cues can collide in time. No automatic tool resolves true overlap perfectly, so for dialogue-heavy footage, scan for overlapping timestamps and nudge one cue earlier or later by hand.

You can do all of this in any plain text editor, since the format is human-readable. If you would rather work visually, a dedicated subtitle editor such as Subtitle Edit on Windows or Aegisub on any platform adds a waveform timeline and can auto-fix common spacing and reading-speed issues in bulk.

SRT vs VTT: which one to export

The two formats are close cousins, and the right choice comes down to where the captions will play.

Reach for SRT for desktop players, video editors, and the great majority of hosting platforms, which is most of what people need. Reach for WebVTT when the captions go on a custom web page using the HTML5 <video> element, because that is the only format browsers read natively for the <track> element. The content is identical between them. VTT just adds a WEBVTT line at the top of the file and uses a period instead of a comma before the milliseconds in each timestamp.

Because the difference is that small, you do not need to re-transcribe to switch. Export SRT, and if you later need VTT, run it through the free converter at /tools/srt-to-vtt. It works in your browser with no account, and nothing uploads.

Add the SRT to a player, editor, or platform

Once you have the file, attaching it depends on where the video lives.

Desktop players. VLC, mpv, and most desktop players auto-load a sidecar SRT when it shares the video’s base name and folder. Rename your file so interview.mp4 is paired with interview.srt, drop them in the same directory, and the captions appear on playback. If a player misses it, its subtitle menu has a manual “load subtitle file” option.

Video editors. In DaVinci Resolve, Premiere Pro, or Final Cut, import the SRT as a subtitle or caption track. The editor drops each cue onto the timeline at the right timecode, and from there you can restyle the font, change the position, and burn the captions into the final export if you want them hardcoded.

Hosting platforms and social video. Most platforms take an SRT directly in the video’s caption or subtitle settings: upload the file, pick the language, and the platform syncs the cues to playback for you. For short-form clips on TikTok, Reels, or Shorts, the platform’s own auto-captions are easy to replace with your SRT, which pays off whenever the audio has technical terms, proper nouns, or non-English speech that auto-captioning tends to mangle.

Burning the captions in. If you need the text permanently part of the picture rather than a toggleable sidecar, that is a separate encoding step. HandBrake can burn an SRT into a video for free: load the video, add the SRT under the Subtitles tab, tick “Burned In,” and encode. Reach for this only when the destination cannot read a sidecar file, since a separate SRT stays editable while a burned-in caption does not. For a fuller walkthrough of attaching and burning in captions across each platform, see how to add subtitles to a video.

Common issues and how to fix them

Captions are out of sync from the very start. A constant offset, where every line is early or late by the same amount, usually means the player is reading a slightly different frame rate than the one the timings assume. Most desktop players have a subtitle-delay control to shift the whole track at once; nudge it until the first line lands, and the rest follows.

Sync drifts further apart over time. Drift that grows as the video plays points to a frame-rate mismatch between the source and the rendered copy, common after converting a 23.976 fps file to 25 fps or back. Re-export the SRT against the version of the video you will actually publish, rather than an earlier cut.

A speaker’s name is missing from the captions. If you renamed labels but a platform shows no names, that platform likely strips speaker prefixes from SRT on import. There is no SRT setting that forces them; the reliable fix is to burn the captions in, where the names are part of the picture.

A non-English video came back in the wrong language. Hushscript detects the spoken language automatically across about 99 languages, but a very short or noisy clip can occasionally be misread. Re-run a cleaner section, or check the languages page to confirm the language is covered.

Long stretches read as one giant cue. This happens when someone speaks for a long time without a clear pause. Split the cue manually at a sentence boundary so no single block holds more than two lines of readable text.

Accuracy tips that pay off in fewer edits

Most of the quality of your captions is set before you ever open the editor, by the audio you feed in. Clean source audio means fewer misheard words and tighter timestamps, which means a shorter edit pass.

Record or export the audio at a normal speaking level without heavy compression, and avoid loud background music under dialogue where you can. When several people are involved, a recording where each voice is reasonably distinct helps the speaker labels stay consistent, which matters if you plan to keep names in the captions. If the source is a video you have already published, the audio is whatever it is, but for anything you record yourself, a few minutes of attention to levels and microphone placement does more for caption quality than any amount of editing afterward.

After the SRT: the rest of the transcript

Creating the SRT also gives you the full transcript, so you are not limited to subtitles. From the same session you can export the text as TXT or DOCX for show notes, a blog version of the talk, or an accessible transcript posted next to the video. Exporting a clean transcript is the same product as captioning a video to text; the SRT is just one of the formats that come out of it.

For the next step, putting these captions onto the video and choosing between a sidecar file and a burned-in track, see how to add subtitles to a video.