How to Transcribe a Long Recording (2+ Hours)
April 24, 2026
A long recording is where most transcription tools quietly give up. The free tier stops at 30 or 60 minutes. The upload stalls halfway through a 2 GB file. The job runs, returns the first 45 minutes, and never mentions the rest is missing. If you have a 3-hour interview, a half-day workshop, or a full board meeting to get into text, the limit you hit isn’t the audio — it’s the tool.
Hushscript takes audio and video files up to 10 hours or 2 GB each, with no daily cap on how many you run. A long file isn’t a special case here; it’s the same upload as a two-minute clip, just left to finish on its own. That single capability — a real ceiling instead of a teaser one — is the whole reason this guide is short on workarounds and long on the actual steps.
What you need before you start
You need three things, and you probably already have all three.
- The recording itself, in any common audio or video format — MP3, M4A, WAV, FLAC, AAC, MP4, MOV, MKV, and the rest. There’s no need to convert it first; just drop the file in.
- A modern browser. The first 30 seconds are processed locally before anything uploads, and for video files the audio is extracted in your browser, so a current version of Chrome, Edge, Firefox, or Safari matters more than a fast machine.
- Enough minutes in your balance to cover the duration. This is the one number worth checking up front. A 2.5-hour interview is 150 minutes, so a 300-minute pack covers it with room to spare. New accounts get 30 free minutes to try the full flow; see the pricing page for what each pack costs.
A quick note on the free minutes, because long recordings burn through them fast: you get 30 free minutes once. The quickest way to unlock them is to add a card — a $1 hold validates it and is released right away, never charged — and the minutes land immediately. If you’d rather use another payment method available in your country, the 30 minutes arrive with your first purchase instead. A card isn’t required; it’s just the fastest route. Thirty minutes is enough to transcribe a short segment and judge the accuracy before you commit a full pack to a three-hour file.
Why long files trip up most tools
It helps to know what you’re avoiding, because it explains why the steps below are so plain.
Long files are expensive to process and slow to move across a network, so a lot of tools treat their free tier as a sign-up funnel rather than a working product. The failures cluster into four shapes:
- Hard time caps. Many tools cut free transcription off at 30 to 60 minutes per file or per day. A 90-minute meeting simply won’t go through.
- Upload timeouts. A 2 GB file on a home connection can take long enough that a server-side request timeout kills it before the transcription engine ever sees it.
- Silent truncation. Some tools accept the whole file, process a portion, and hand back a transcript that ends early with no warning. You find out when a quote you remember isn’t in the text.
- Daily and concurrent limits. Even paid plans sometimes cap how many files you can run in a day, so a research day with six interviews hits a wall at file four.
The defensive habit that saves the most time is simple: confirm a tool’s real per-file limit before you upload, not after a failed one. Hushscript’s is 10 hours or 2 GB, stated plainly, with no separate daily ceiling.
Transcribe a long recording, step by step
There’s no long-recording mode to switch on. The process is identical to a short clip; it just runs longer in the background.
- Open the converter and drop your file. Go to audio to text and drag the recording onto the upload area, or click to browse. Any common audio or video format is accepted. If it’s a video, the audio is extracted in your browser at this point, so the heavy video file never leaves your machine.
- Check the 30-second preview. Before you sign up, Hushscript transcribes the first 30 seconds locally and shows you the speaker-labeled output. This is your accuracy check: read it, confirm the speakers are split sensibly and the words are right, and you’ll know whether the audio is clean enough for the full run. No account is needed for the preview.
- Sign in to transcribe the rest. If the preview looks right, sign up and the full file uploads. This is where your minute balance matters — a 150-minute file spends 150 minutes. Make sure the pack you have covers the duration.
- Let the job finish on its own. Processing time scales with file length, not with real time. A 2-hour file typically lands in a few minutes; an 8-hour file takes longer but runs unattended. You don’t have to keep the tab in focus — come back when it’s done.
- Relabel the speakers. The transcript arrives with generic labels like
Speaker AandSpeaker B. In the editor, renameSpeaker Ato a real name once and it updates everywhere in the document, so a three-hour interview reads with the right names throughout. - Export in the format you need. Download as TXT, SRT, DOCX, or JSON — no watermark on any of them. SRT is the one to reach for on long files, because its timestamps let you jump straight to any moment instead of scrolling.
The result sits in your dashboard with speaker labels, timestamps, and the full export menu, and it stays there for you to come back to.
A worked example: a 2.5-hour recorded interview
Here’s how that looks with a real shape of file. Say you’ve recorded a 2-hour-30-minute interview as a single MP3 — two people, one quiet, one loud, recorded over a video call.
You drop the MP3 on the upload area. The 30-second preview comes back almost instantly and shows two speakers already separated, which tells you the diarization has a clean signal to work with. You sign in; the 150-minute file uploads and the job starts. A few minutes later the transcript is ready — one continuous document, roughly 22,000 to 25,000 words for two and a half hours of conversation, with timestamps running unbroken from 00:00:00 to 02:30:00.
The raw output reads like this:
[00:00:04] Speaker A: Thanks for making the time. Can we start with how the project began?
[00:00:11] Speaker B: Of course. So it really started back in early 2023, when...
You rename Speaker A to the interviewer and Speaker B to the subject once, and every line updates. Then you export twice: a TXT to paste into a summarizer for a first-pass digest, and an SRT so that when you write up a direct quote you can click the timestamp and hear exactly how it was said. Total cost: 150 minutes off your balance, one upload, no splitting, no truncation, no second attempt.
Keep speakers labeled across a long session
Diarization on a 3-hour recording is genuinely harder than on a 10-minute clip — voices drift, pauses stretch out, and background noise comes and goes. Hushscript runs the labeling over the entire transcript at once, so the labels stay consistent end to end, but the recording itself can help or hurt that.
At recording time:
- If your tool can capture a separate track per participant, use it. Zoom’s local recording, for example, can produce either a mixed track or per-channel audio depending on settings — the mixed stereo track usually works best for upload.
- Recording a physical room is the hard case. A directional mic, or simply seating speakers further apart, cuts the crosstalk that makes two voices read as one.
Audio quality:
- 44.1 kHz / 16-bit WAV, or a high-bitrate MP3 at 128 kbps or above, is plenty for the engine. There’s no accuracy gain from 96 kHz — it just makes a long file larger.
- If the recording is quiet or uneven, normalizing the volume in an audio editor before upload often lifts accuracy more than any setting change does.
Once the transcript is back, relabeling Speaker A with a real name is a one-time edit that propagates through the whole document. For a deeper look at how the labeling itself works, see speaker identification.
Split the file, or upload it whole?
The short answer for almost everyone: upload it whole. Splitting a long recording into parts is the thing to avoid, not the thing to do, and there are only two situations where it’s necessary.
Upload as one file when the recording is 10 hours or under and 2 GB or under — which covers the overwhelming majority of interviews, lectures, meetings, and panels. One file means one continuous set of timestamps and one consistent set of speaker labels. Splitting breaks both: each part restarts its clock at zero, and the same person can get a different label in part two than in part one, leaving you to stitch and re-map by hand.
Split only when the file is genuinely over 10 hours or over 2 GB — a multi-day conference recording, say. In that case, cut on a natural silence (a session break) rather than mid-sentence, and keep a note of the offset so you can renumber timestamps afterward. If you’re cutting an MP3 before upload, prefer constant bitrate (CBR) over variable bitrate (VBR), since VBR can introduce sync drift in some editors.
A 2 GB video file is the common near-miss. Because Hushscript extracts the audio in your browser before uploading, the video’s size doesn’t count — only the much smaller audio stream travels, so a large video usually fits comfortably even when the file on disk looks too big.
Best settings for accuracy on long audio
The single biggest factor in accuracy is speech clarity, not sample rate or file format. For long sessions specifically:
- Use a compressor on the recording side — a hardware or software audio compressor, not file compression — to even out the gap between a loud interviewer and a quiet subject. This matters far more across three hours than across ten minutes, where one drift in level can lose whole exchanges.
- FLAC is lossless and accepted, but it won’t beat a good MP3 on accuracy. Its only real use is removing doubt: if you want to be certain audio quality isn’t the bottleneck, FLAC settles the question at the cost of a larger file.
- Avoid heavy noise reduction before upload. Aggressive denoising can chew up consonants and actually lower accuracy. Light normalization helps; scrubbing does not.
Troubleshooting common long-file problems
Accuracy dips in the back half of the recording. This is almost always falling audio levels or rising room noise late in a long session, not a length limit. Check the original recording around the timestamps where errors cluster; if the level dropped, normalize and re-run, and use the 30-second preview on a later segment to confirm before spending the minutes again.
Two people keep getting merged into one speaker. Overlapping speech and crosstalk are the usual cause. There’s no perfect fix after the fact, but a recording with more separation between voices — a per-channel track, or mics further apart — diarizes far more cleanly next time. In the transcript you have now, you can correct the occasional mislabel manually in the editor.
A large file is slow to upload. Upload speed is your connection, not the tool. For a multi-gigabyte video, remember the audio is extracted locally first, so the actual upload is much smaller than the file on disk. If you’re on a weak connection, a wired link or simply leaving it to run will get there.
Background noise throughout the recording. Steady noise — air conditioning, traffic, room hum — lowers accuracy across the whole file. Light volume normalization usually helps more than denoising, which tends to damage speech. If a segment is badly affected, the timestamps still let you find it and listen back.
The transcript looks like it ended early. On Hushscript a transcript covers the full file; there’s no silent truncation. If the text seems short, check the final timestamp against the recording’s real length — long stretches of silence or music simply produce few words, which can read as “missing” but isn’t.
Export a long transcript: which format
For a multi-hour session, the format you pick changes how usable the transcript is.
- TXT is the most portable and the right input for feeding into an LLM to summarize or analyze a long conversation.
- SRT carries utterance-level timestamps, so you can jump to any moment without scrolling. If your reason for transcribing is to find specific moments, this is the format.
- DOCX keeps speaker labels and timestamps in a Word-compatible file — best for sharing a transcript with people who need to comment on it.
- JSON gives you the raw structure (speaker, start, end, text) for piping into another tool.
Every export is included on every plan, with no watermark and no separate fee to unlock subtitles.
For long recordings billed by the minute, pay-as-you-go transcription tends to beat a subscription with a monthly cap you’d blow through anyway — you pay for exactly the hours you transcribe and nothing more, with no daily limit between you and a busy week. The two most common long-recording jobs have their own walkthroughs: how to transcribe a podcast covers multi-guest episodes and show notes, and how to transcribe a lecture covers classroom audio quirks like reverb and audience questions from across a room.