Skip to main content

Convert M4A / Apple Voice Memo to Text, Privately

May 5, 2026

M4A is the default format for Apple Voice Memos, and it is what you get when you export from GarageBand, record audio with QuickTime, or save a voice note in many apps on the App Store. Turning one into text is quick. The part that deserves more attention is privacy, because the things people record on a phone tend to be more personal than the things they type.

This guide covers where M4A files come from, how to convert one to text with speaker labels, how to keep a sensitive recording private, and how to handle the awkward cases: two-sided calls, quiet memos, and accents.

Where M4A files come from (iPhone and Mac)

M4A is MPEG-4 audio, almost always AAC-encoded. It is Apple’s house format, so you meet it constantly across the ecosystem and a few places outside it:

So the file in front of you might be a quick note to self, an interview, a recorded call, or a forwarded voice message. The conversion is identical for all of them. What changes is how careful you want to be with the contents, which is the thread running through the rest of this guide.

Getting a Voice Memo off your iPhone

Open the Voice Memos app, tap the recording you want, tap the three-dot menu (), and choose Share. From there you can AirDrop it to your Mac, or send it to yourself by Mail, Messages, or Notes. However you send it, it lands as a .m4a file.

On a Mac, Voice Memos keeps its recordings in ~/Library/Application Support/com.apple.voicememos/Recordings/. If you would rather skip the share sheet, you can copy the .m4a files straight out of that folder.

What you need

There is very little to set up:

A note on cost, kept short: Hushscript is not free, because it runs on top-tier transcription AI that has a real per-minute cost. It is pay-as-you-go with no subscription, and you get 30 free minutes to try it. Those minutes arrive instantly when you validate a card with a $1 hold that is authorized and then released right away, never charged. If you would rather pay another way, the 30 minutes are granted once after your first minutes purchase, and a card is not required. Exact pack prices are on the pricing page.

Convert an M4A in three steps

The flow is built so you can hear the quality before you commit to anything.

  1. Drop the file and watch the preview. Go to voice recording to text and drag your .m4a onto the upload area, or click to browse. Hushscript transcribes the first 30 seconds and shows it back to you with speaker labels — no account, no payment, nothing to fill in. This preview is where you check that the audio is clear enough and that the speakers are being separated the way you expect.
  2. Sign up to transcribe the rest. If the preview looks right, enter your email to create an account. This is the step that enables full transcription, and it is also where your 30 free minutes come from. Files up to 10 hours or 2 GB are accepted, with no daily caps.
  3. Get, relabel, and export the transcript. Upload the full file and let it process. When it is done, the transcript appears in your dashboard with speakers marked. Rename “Speaker 1” to a real name in a click, then export as TXT, SRT, DOCX, or JSON.

The moment your transcript is ready, the audio is deleted from the server. There is no setting to retain it and it is never used to train anything. You keep the transcript; we do not keep the recording.

How long it takes

Most Voice Memos are short — a thought captured on the way somewhere, a couple of minutes of notes — and those transcribe in well under 30 seconds. Longer recordings scale roughly with their length. A 30-minute memo of meeting notes finishes in a minute or two; a full hour takes a few minutes. You do not have to sit on the page while it runs. The transcript lands in your dashboard and you can come back to it.

Which languages work

Hushscript transcribes around 99 languages, detected automatically. A memo you recorded in Spanish, French, or Japanese transcribes without you setting anything — there is no language dropdown to get wrong. Accuracy is strongest in the most common languages and still solid in the long tail.

A worked example: a recorded interview

Say you recorded a 25-minute interview on your iPhone — you and one other person, both audible, saved by Voice Memos as interview-2026-06.m4a. You AirDrop it to your Mac, drop it onto the preview, and the first 30 seconds come back labeled. You rename the speakers, transcribe the full file, and a couple of minutes later export a TXT that reads like this:

Speaker A   00:00:04   Thanks for making the time. Can we start with how
                       the project actually began?
Speaker B   00:00:11   Sure. It started as a side experiment, honestly. We
                       didn't expect it to turn into the main thing.
Speaker A   00:00:19   And when did that shift happen?
Speaker B   00:00:23   Around the second prototype. That's when people
                       outside the team started using it daily.

Each turn carries a speaker tag and a timestamp, so quoting someone accurately is a matter of finding the line, not scrubbing back and forth through audio. If you export SRT instead, you get the same content cut into timed caption blocks, which is what you want if you ever pair the transcript with a video or audio player.

Keep sensitive memos private

Voice Memos tend to hold things you would never put in an email: interview notes, a doctor’s observations, minutes from a closed meeting, a private conversation you wanted to remember exactly. The contents are often more sensitive than a document, which is why how a tool treats the file matters here more than it would for, say, a podcast episode you are about to publish anyway.

Two things protect the recording with Hushscript. First, the audio is deleted the moment the transcript is ready, so there is no lingering copy sitting in storage. Second, the transcript that remains is encrypted at rest. If our storage were ever leaked, the contents would surface as unreadable ciphertext rather than your words. That is leak protection, stated plainly — it is not a claim that no one on our side can ever read a transcript, because the key is held on the server, not by you alone. The honest version is the useful one: a breach would expose ciphertext, and you can delete any transcript yourself in one click.

If you are converting a legal consultation, a patient note, or a confidential meeting, that combination — delete the audio, encrypt what is left, let you erase it — is the reason to choose an upload-based tool that discards rather than one that quietly keeps your files. The fuller explanation of how the whole pipeline handles your audio is at audio to text.

Two-sided call recordings

A lot of M4A files are recorded calls, and calls are where speaker separation earns its keep. If both parties are audible in the file, Hushscript separates them automatically — Speaker A and Speaker B map to the two voices — so you can read who said what instead of untangling one merged block.

A few things shape how well that works:

For the full walk-through of recording, consent, and cleanup specific to calls, see how to transcribe a phone call. For more on how the speaker tags themselves are produced, speaker identification goes deeper.

Troubleshooting common issues

Most M4A files transcribe without any fuss. When the result is not what you hoped, the cause is almost always one of these, and most are fixable.

The recording is too quiet

A memo recorded with the phone in a pocket or across a table comes out faint, and faint audio means more guessed words. The fix that helps most is at the source: hold the phone closer to whoever is speaking next time, and keep it off soft surfaces that muffle it. For a recording you already have, the transcript will still be largely right — read it against the audio for the quiet passages rather than trusting it blind.

Strong accents or fast speech

Accents are handled well, but heavy accents combined with fast, overlapping speech are the hardest case for any transcription engine. Expect the occasional swapped word or run-on. Names and technical terms are the usual casualties, so those are the first thing to proofread.

Two people talking at once

When speakers overlap, the boundary between them blurs and a few words can land under the wrong speaker tag. There is no perfect fix for crosstalk in a recording that already exists. If you control the recording, asking people not to talk over each other does more for the final transcript than any setting.

The file is large or long

Long memos are fine — up to 10 hours or 2 GB per file, with no daily limit — but a big file takes longer to process and you do not need to wait on the page. Start it, leave, and collect the transcript from your dashboard later. If a file refuses to upload, it is usually a flaky connection rather than the file itself; retrying on a stable network normally clears it.

An unusual Apple format

If you have something other than plain M4A — a CAF file, an AIFF, the audio inside an MP4 — you do not need to convert it first. Drop it as-is. If a format genuinely is not accepted, email support@hushscript.com and we will add it.

Preview first, or just transcribe?

The 30-second preview is free and needs no account, so the simple rule is: when the audio quality is in any doubt — a recorded call, something captured at a distance, a noisy room — preview first. You will see in 30 seconds whether the speakers separate and whether the words are landing, before you spend any of your minutes. For a clean memo you recorded close to the mic, you can reasonably skip straight to signing up and transcribing the whole thing.

Accuracy tips

A few habits raise the quality of the transcript more than any single setting:

Wrapping up

M4A is M4A whatever created it, so the steps here work the same for an Android voice recorder or a forwarded WhatsApp note as they do for an Apple Voice Memo. Drop the file, check the 30-second preview, sign up to transcribe the rest, and export a speaker-labeled transcript — while the audio gets deleted and the transcript stays encrypted at rest.

If you are working through a pile of recordings in different formats, how to transcribe any audio file lays out the full format list and the best approach for each.

Perguntas frequentes

How do I get an M4A file off my iPhone?

Open the Voice Memos app, tap the recording, tap the three-dot menu, and choose Share. AirDrop it to your Mac or send it to yourself by Mail, Messages, or Notes. It arrives as a .m4a file you can then upload.

Is the M4A deleted after transcription?

Yes. The audio is deleted from the server the moment your transcript is ready. There is no retain option and no training use. Your transcript is what remains, not a copy of the recording.

Are my transcripts kept private after that?

Your transcripts are encrypted at rest. If our storage were ever leaked, the contents would be unreadable ciphertext rather than your words. You can also delete any transcript in one click from your dashboard.

Can I transcribe a two-sided call recording from an iPhone?

Yes, as long as both sides are captured in the file. Some call recorder apps mix both parties into one track and some record only your side. Speaker labels work best when both voices are audible.

What other Apple formats work?

M4A, CAF, AAC, and the audio inside an MP4 all work, as does AIFF, which behaves like WAV. You do not need to convert first. Drop the file and Hushscript handles the format.

How long does it take to transcribe a Voice Memo?

Most Voice Memos run a few minutes and transcribe in under 30 seconds. Longer files scale roughly with length. A one-hour memo takes a few minutes, and you can leave the page while it runs.

Do I need to install anything on my iPhone or Mac?

No. Transcription runs in the browser and in the cloud, so there is no app to install. You only need the .m4a file and a browser. The 30-second preview needs no account at all.

Will it transcribe a memo recorded in another language?

Yes. Hushscript detects the language automatically and handles around 99 of them, so a memo in Spanish, French, or Japanese transcribes without you choosing anything.