What's the difference between subtitles and captions?

Subtitles transcribe speech for viewers who can hear the audio but don't understand the language. Captions transcribe speech plus relevant non-speech sounds (music cues, a door slamming) for deaf and hard-of-hearing viewers. In day-to-day use the words are interchangeable, and most platforms serve a single track for both purposes.

Do I need special software to add subtitles?

For a sidecar SRT or VTT file, usually not. Most desktop players and every major platform accept the subtitle file alongside the video. To burn subtitles permanently into the picture you need an encoder such as HandBrake, which is free, or a video editor.

Can I generate captions from a video automatically?

Yes. Upload the video to Hushscript and it produces a timed SRT from the audio, so you don't type or set a single timestamp. The audio is extracted in your browser before anything is sent, so only the audio reaches the server.

What video formats can I caption?

Common containers like MP4, MOV, MKV, and WebM, and most files with a standard audio track. Hushscript reads the audio from whatever you drop in. If a file isn't accepted, email support@hushscript.com and we'll add it.

Should I burn subtitles in or keep them as a separate file?

Keep them separate (sidecar) when the destination supports a caption track, because the SRT stays editable and you can ship several languages without re-encoding. Burn them in when the recipient can only receive one video file, such as a clip sent through a messaging app or embedded with no caption support.

Can I use the same SRT for TikTok, Reels, and YouTube?

Yes. Each platform has its own caption panel, and all of them accept SRT. Upload the same file to each one and the platform syncs it to playback. Most short-form apps also auto-generate captions, which you can replace with your own SRT for better accuracy.

Why are the auto-captions on my short video wrong?

Platform auto-captions are tuned for clear English speech. They slip on accents, fast or overlapping talk, technical terms, proper nouns, and any non-English audio. Replacing them with an accurate SRT fixes the words while the platform keeps the timing.

How much does it cost to caption a video?

Hushscript runs on top-tier transcription AI, so it's pay-as-you-go with no subscription. New accounts get 30 free minutes to try. See the pricing page for the per-minute packs.

How to Add Subtitles or Captions to a Video

June 13, 2026

Adding subtitles to a video is two jobs, not one: produce the subtitle file, then attach it to the video. People dread the first job because timing every line by hand is slow. Automated transcription removes that part. You generate a timed transcript from the audio, export it as SRT, and attach that file in whatever player, editor, or platform you’re using. This guide covers both jobs, plus the one real decision you have to make along the way: a separate subtitle file or a track baked into the picture.

Captions vs subtitles

The terms get used interchangeably, but there is a real distinction. Subtitles transcribe the dialogue for viewers who can hear the audio but don’t understand the language. Captions transcribe the dialogue and add non-speech audio cues such as [music playing] or [door slams] for viewers who can’t hear the audio at all.

In practice most platforms expose a single caption track that does both jobs, which is why the words blur together. Pick by your goal. If you’re captioning for accessibility, add short bracketed descriptors for sounds that carry meaning. If you’re after international reach or so search engines can index what’s said, plain dialogue subtitles are enough. Either way the starting point is the same file, so you don’t have to decide before you generate it.

What you need before you start

You need the source video and a way to produce the subtitle file. If you already have an SRT, skip to the section on attaching it. If you don’t, you need the audio transcribed with timestamps, which is what the next section covers. To burn subtitles into the picture rather than ship them separately, you also need a free encoder like HandBrake or a video editor that can render a caption track on export.

One thing you don’t need is the original video at hand for the transcription step. Hushscript reads the audio out of the video in your browser, so the upload is just the audio, and the video file never leaves your machine.

Generate captions from the audio

If you don’t have a subtitle file yet, make one from the video’s audio track. You won’t type or time anything by hand.

Drop the video on /video-to-text. A 30-second, speaker-labeled preview of the start renders right there, with no account. The audio is extracted in your browser first, so only the audio is sent and the video stays on your device. Use the preview to confirm the speech is being heard correctly before you go further.
Sign up to caption the whole file. Transcribing past the preview takes a free account. New accounts get 30 free minutes to try: instantly if you validate a card with a $1 hold that’s released right away and never charged, or with your first minutes purchase if you pay another way. A card isn’t required.
Upload the full video and export as SRT. When the transcript is ready, download it as SRT. Each line of speech becomes a cue with a start timestamp, an end timestamp, and the text, formatted to the SRT spec with nothing left for you to do.

The exported SRT is ready to attach straight away. For an interview or a panel where it helps to see who is speaking, rename the speaker labels in the transcript editor before you export, and the names ride along in each cue.

For a closer look at the file itself, including how to edit timing and trim long lines, see how to create SRT subtitles from a video.

A worked example

Say you have panel-talk.mp4, a 12-minute recorded discussion between a host and two guests, and you want captions to upload to YouTube. You drop it on the video-to-text page; the 30-second preview comes back with three speakers already separated, so you know the audio is clean. You sign up, the full file transcribes in a couple of minutes, and you rename “Speaker 1” to the host’s name and the other two to the guests. Exported, the SRT opens like this:

1
00:00:01,200 --> 00:00:04,600
Host: Welcome back. Today we're talking
about open-source funding.

2
00:00:04,900 --> 00:00:08,150
Guest A: Thanks for having me. It's a
problem nobody has really solved yet.

That file is what you upload to YouTube. The cue numbers, the timestamps, and the line breaks are all in place, so the platform maps the captions onto the video without any further editing.

Convert SRT to VTT if needed

HTML5 web players want WebVTT (.vtt) rather than SRT. The two are close cousins: VTT adds a WEBVTT header line and uses a period instead of a comma in the timestamps. Convert one to the other for free at /tools/srt-to-vtt. The tool runs in your browser, needs no account, and uploads nothing. Drag the SRT in, download the VTT out.

Burn-in vs sidecar SRT

There are two ways subtitles travel with a video, and this is the decision worth getting right before you publish.

A sidecar SRT is a separate file. The player reads the video and the subtitle file together and overlays the captions at playback. This is the flexible option. You can fix a typo in the SRT without re-encoding the video, and you can ship several languages as several SRT files against one video. It’s what YouTube, Vimeo, and most editing suites use.

A burned-in (hardcoded) track is the opposite. The text is rendered permanently into the video frames, so every viewer sees it regardless of player settings and there’s no way to turn it off. You need this when the destination can’t read a caption file: a clip sent through a messaging app, a bare video embed with no caption support, or any case where the recipient can only receive a single video file.

For most uses the sidecar approach wins, because keeping the video and the subtitles independent means you can correct or re-export the captions later without touching the video. Reach for burn-in only when the destination forces your hand, or when you specifically want captions that can’t be switched off, which is common for social clips that autoplay muted.

Attach the subtitles, by destination

Once you have the SRT, how you attach it depends on where the video is going.

Desktop video players

Put the SRT in the same folder as the video and give it the same base name:

panel-talk.mp4
panel-talk.srt

VLC, mpv, and most desktop players pick the SRT up automatically on play. If one doesn’t, open its subtitle menu and load the file by hand.

Video editors

In an editor, import the SRT as a subtitle track. DaVinci Resolve takes it through Timeline, then Import Subtitle; Premiere Pro through Window, then Captions and Graphics, where you import the SRT. Both drop each cue onto its correct timecode, and from there you can restyle the captions and burn them into the export if you want a hardcoded result.

YouTube and other long-form platforms

In YouTube Studio, open the video, go to Subtitles, choose Add language, click Add beside the language, then Upload file and pick the SRT. YouTube maps the timecodes to the video and shows the captions as a track viewers can toggle. Uploading your own SRT replaces the auto-generated captions for that language, which is the move when the video carries technical terms, names, or accented speech that the machine captions mangled.

TikTok, Reels, and YouTube Shorts

Short-form apps auto-generate captions from the audio on upload. The results are passable for clear English, but they drop off fast for accents, quick or overlapping speech, and anything not in English. To use your own captions instead, open each platform’s caption panel after the video is up: TikTok’s upload-and-edit flow, Reels through Edit video, then Captions, or the Subtitles panel on YouTube Shorts, which is the same one as long-form YouTube. If a platform won’t take an SRT file directly, generate the accurate transcript in Hushscript, copy the corrected text, and paste it into the app’s caption editor over the machine guesses; the app keeps handling the timing while your words replace the errors.

Burning in with HandBrake

HandBrake is a free, cross-platform encoder that can bake an SRT into the output video in a single pass.

Open HandBrake and load the video.
Go to the Subtitles tab, choose Add External SRT, and select your SRT.
Tick the Burned In box next to that subtitle track.
Run the encode. The output video carries the subtitles permanently in the picture.

Troubleshooting common problems

The captions drift out of sync. Almost always this means the SRT and the video don’t share the same start point, usually because the video was trimmed after the transcript was made. Re-export the SRT against the final cut rather than nudging every timestamp. If only the back half drifts, the video’s frame rate was likely changed on export, which stretches the timeline; re-render at the original rate or regenerate the captions from the final file.

Auto-captions on a social post are full of errors. That’s the platform’s own engine struggling with accents, speed, or non-English audio. Don’t fix them line by line in the app. Generate an accurate transcript in Hushscript and either upload that SRT or paste the corrected text over the machine captions, and the platform keeps the timing it already worked out.

Lines run off the edge of the screen. SRT cues with very long single lines get clipped by some players and by short-form layouts where the safe area is narrow. Break long cues into two shorter lines, or split one long cue into two timed cues. The transcript editor is the place to do this before export, not the platform afterwards.

Speaker names vanished after upload. Some platforms strip the leading “Name:” prefix when they import an SRT, treating only the spoken text as the caption. If keeping speakers visible matters, burn the names into the cue text rather than relying on the import to preserve them, or accept plain captions for that destination.

The subtitles don’t show up at all. For desktop players, confirm the SRT sits in the same folder with the same base name as the video; a mismatch is the usual cause. For web players, check that you handed over a .vtt file rather than an .srt, since browsers expect WebVTT. On a platform, make sure the caption track is enabled and set to the right language in the player’s settings.

Tips for accurate captions

Accuracy is decided at the audio stage, long before any platform sees the file. A clean recording gives a clean transcript, and a clean transcript gives captions you barely have to touch.

Caption from the highest-quality audio you have. If the video exists in several renders, transcribe the one with the least compression. Where the speech is faint under music or room noise, lift the dialogue before transcription, because the model can only caption what it can clearly hear. Always read the 30-second preview before committing: it surfaces low volume, heavy accents, or crosstalk early, while it’s still cheap to fix the source. For interviews and panels, lean on the free speaker labels and give each speaker a real name, since named captions are far easier for viewers to follow than a wall of unattributed text. And keep the SRT as your source of truth. Edit there and re-export, rather than correcting inside each platform’s caption tool, so every destination stays in step.

The reason Hushscript fits this particular task is the honesty of the free tier paired with the work it actually does for you. The 30 free minutes are enough to caption a few short videos end to end, the speaker separation comes free on every transcript, and the SRT export is built to the spec so it drops straight into any of the destinations above. After the free minutes it’s pay-as-you-go with no subscription; the pricing page has the per-minute packs.

The single rule that keeps a captioned video maintainable is to treat the SRT, not any platform’s caption editor, as the source of truth. Keep it next to the video so you can update or re-export subtitles later without transcribing again. For pulling a usable SRT out of a multi-speaker recording, see how to transcribe a Zoom meeting.