Skip to main content

How to Transcribe a Long Recording (2+ Hours)

April 24, 2026

A long recording is where most transcription tools quietly give up. The free tier stops at 30 or 60 minutes. The upload stalls halfway through a 2 GB file. The job runs, returns the first 45 minutes, and never mentions the rest is missing. If you have a 3-hour interview, a half-day workshop, or a full board meeting to get into text, the limit you hit isn’t the audio — it’s the tool.

Hushscript takes audio and video files up to 10 hours or 2 GB each, with no daily cap on how many you run. A long file isn’t a special case here; it’s the same upload as a two-minute clip, just left to finish on its own. That single capability — a real ceiling instead of a teaser one — is the whole reason this guide is short on workarounds and long on the actual steps.

What you need before you start

You need three things, and you probably already have all three.

A quick note on the free minutes, because long recordings burn through them fast: you get 30 free minutes once. The quickest way to unlock them is to add a card — a $1 hold validates it and is released right away, never charged — and the minutes land immediately. If you’d rather use another payment method available in your country, the 30 minutes arrive with your first purchase instead. A card isn’t required; it’s just the fastest route. Thirty minutes is enough to transcribe a short segment and judge the accuracy before you commit a full pack to a three-hour file.

Why long files trip up most tools

It helps to know what you’re avoiding, because it explains why the steps below are so plain.

Long files are expensive to process and slow to move across a network, so a lot of tools treat their free tier as a sign-up funnel rather than a working product. The failures cluster into four shapes:

The defensive habit that saves the most time is simple: confirm a tool’s real per-file limit before you upload, not after a failed one. Hushscript’s is 10 hours or 2 GB, stated plainly, with no separate daily ceiling.

Transcribe a long recording, step by step

There’s no long-recording mode to switch on. The process is identical to a short clip; it just runs longer in the background.

  1. Open the converter and drop your file. Go to audio to text and drag the recording onto the upload area, or click to browse. Any common audio or video format is accepted. If it’s a video, the audio is extracted in your browser at this point, so the heavy video file never leaves your machine.
  2. Check the 30-second preview. Before you sign up, Hushscript transcribes the first 30 seconds locally and shows you the speaker-labeled output. This is your accuracy check: read it, confirm the speakers are split sensibly and the words are right, and you’ll know whether the audio is clean enough for the full run. No account is needed for the preview.
  3. Sign in to transcribe the rest. If the preview looks right, sign up and the full file uploads. This is where your minute balance matters — a 150-minute file spends 150 minutes. Make sure the pack you have covers the duration.
  4. Let the job finish on its own. Processing time scales with file length, not with real time. A 2-hour file typically lands in a few minutes; an 8-hour file takes longer but runs unattended. You don’t have to keep the tab in focus — come back when it’s done.
  5. Relabel the speakers. The transcript arrives with generic labels like Speaker A and Speaker B. In the editor, rename Speaker A to a real name once and it updates everywhere in the document, so a three-hour interview reads with the right names throughout.
  6. Export in the format you need. Download as TXT, SRT, DOCX, or JSON — no watermark on any of them. SRT is the one to reach for on long files, because its timestamps let you jump straight to any moment instead of scrolling.

The result sits in your dashboard with speaker labels, timestamps, and the full export menu, and it stays there for you to come back to.

A worked example: a 2.5-hour recorded interview

Here’s how that looks with a real shape of file. Say you’ve recorded a 2-hour-30-minute interview as a single MP3 — two people, one quiet, one loud, recorded over a video call.

You drop the MP3 on the upload area. The 30-second preview comes back almost instantly and shows two speakers already separated, which tells you the diarization has a clean signal to work with. You sign in; the 150-minute file uploads and the job starts. A few minutes later the transcript is ready — one continuous document, roughly 22,000 to 25,000 words for two and a half hours of conversation, with timestamps running unbroken from 00:00:00 to 02:30:00.

The raw output reads like this:

[00:00:04] Speaker A: Thanks for making the time. Can we start with how the project began?
[00:00:11] Speaker B: Of course. So it really started back in early 2023, when...

You rename Speaker A to the interviewer and Speaker B to the subject once, and every line updates. Then you export twice: a TXT to paste into a summarizer for a first-pass digest, and an SRT so that when you write up a direct quote you can click the timestamp and hear exactly how it was said. Total cost: 150 minutes off your balance, one upload, no splitting, no truncation, no second attempt.

Keep speakers labeled across a long session

Diarization on a 3-hour recording is genuinely harder than on a 10-minute clip — voices drift, pauses stretch out, and background noise comes and goes. Hushscript runs the labeling over the entire transcript at once, so the labels stay consistent end to end, but the recording itself can help or hurt that.

At recording time:

Audio quality:

Once the transcript is back, relabeling Speaker A with a real name is a one-time edit that propagates through the whole document. For a deeper look at how the labeling itself works, see speaker identification.

Split the file, or upload it whole?

The short answer for almost everyone: upload it whole. Splitting a long recording into parts is the thing to avoid, not the thing to do, and there are only two situations where it’s necessary.

Upload as one file when the recording is 10 hours or under and 2 GB or under — which covers the overwhelming majority of interviews, lectures, meetings, and panels. One file means one continuous set of timestamps and one consistent set of speaker labels. Splitting breaks both: each part restarts its clock at zero, and the same person can get a different label in part two than in part one, leaving you to stitch and re-map by hand.

Split only when the file is genuinely over 10 hours or over 2 GB — a multi-day conference recording, say. In that case, cut on a natural silence (a session break) rather than mid-sentence, and keep a note of the offset so you can renumber timestamps afterward. If you’re cutting an MP3 before upload, prefer constant bitrate (CBR) over variable bitrate (VBR), since VBR can introduce sync drift in some editors.

A 2 GB video file is the common near-miss. Because Hushscript extracts the audio in your browser before uploading, the video’s size doesn’t count — only the much smaller audio stream travels, so a large video usually fits comfortably even when the file on disk looks too big.

Best settings for accuracy on long audio

The single biggest factor in accuracy is speech clarity, not sample rate or file format. For long sessions specifically:

Troubleshooting common long-file problems

Accuracy dips in the back half of the recording. This is almost always falling audio levels or rising room noise late in a long session, not a length limit. Check the original recording around the timestamps where errors cluster; if the level dropped, normalize and re-run, and use the 30-second preview on a later segment to confirm before spending the minutes again.

Two people keep getting merged into one speaker. Overlapping speech and crosstalk are the usual cause. There’s no perfect fix after the fact, but a recording with more separation between voices — a per-channel track, or mics further apart — diarizes far more cleanly next time. In the transcript you have now, you can correct the occasional mislabel manually in the editor.

A large file is slow to upload. Upload speed is your connection, not the tool. For a multi-gigabyte video, remember the audio is extracted locally first, so the actual upload is much smaller than the file on disk. If you’re on a weak connection, a wired link or simply leaving it to run will get there.

Background noise throughout the recording. Steady noise — air conditioning, traffic, room hum — lowers accuracy across the whole file. Light volume normalization usually helps more than denoising, which tends to damage speech. If a segment is badly affected, the timestamps still let you find it and listen back.

The transcript looks like it ended early. On Hushscript a transcript covers the full file; there’s no silent truncation. If the text seems short, check the final timestamp against the recording’s real length — long stretches of silence or music simply produce few words, which can read as “missing” but isn’t.

Export a long transcript: which format

For a multi-hour session, the format you pick changes how usable the transcript is.

Every export is included on every plan, with no watermark and no separate fee to unlock subtitles.


For long recordings billed by the minute, pay-as-you-go transcription tends to beat a subscription with a monthly cap you’d blow through anyway — you pay for exactly the hours you transcribe and nothing more, with no daily limit between you and a busy week. The two most common long-recording jobs have their own walkthroughs: how to transcribe a podcast covers multi-guest episodes and show notes, and how to transcribe a lecture covers classroom audio quirks like reverb and audience questions from across a room.

Frequently asked questions

Is there a file size limit for long recordings?

Each upload can be up to 10 hours or 2 GB, whichever comes first. There are no daily or monthly caps — every file is charged against the minutes in your balance, and nothing else throttles you.

Do I have to split a long recording into parts?

No. A 3-hour or 8-hour file goes up as one upload and comes back as one transcript with continuous timestamps. You only need to split if a file is over 10 hours or larger than 2 GB.

Will speaker labels stay consistent across a 3-hour recording?

Yes. Speaker diarization runs over the whole transcript at once, not chunk by chunk, so each person keeps the same label from the first minute to the last.

Can I upload a long video file instead of audio?

Yes. The audio is extracted from the video in your browser before anything is sent, so only the audio reaches the server — the original video never uploads, which also keeps a 2 GB lecture recording well under the size limit.

How long does a long file take to transcribe?

Processing scales with length, not linearly with real time. A 2-hour recording usually finishes in a few minutes; a full-day, near-10-hour file takes longer but still runs unattended while you do other things.

What export formats work best for a long transcript?

TXT for feeding into other tools, SRT for jumping to a timestamp, DOCX for sharing and annotation, and JSON for the raw speaker-and-timing data. All four are included with no watermark.

Do my minutes expire if I only transcribe occasionally?

Minutes expire only after 6 months with no activity. Transcribe anything inside that window and the clock resets, so an occasional long recording won't cost you your balance.