Echo Lab

Transcribe & speak — 100% local, nothing leaves your browser

Local

Drop audio file

MP3, WAV, M4A, OGG, FLAC, WEBM

or
0:00
0:00 / 0:00
Trim
0:00 end
Drag handles to select a section
Transcript
Record or upload 5–10 seconds of voice to clone
0:00 or
Audio
0:00 / 0:00
These are small on-device models — results may be imperfect. Requires WebGPU (Chrome/Edge) for usable speed.
Result
Echo Lab Guide

Everything runs locally in your browser — no servers, no uploads, completely private. Models are downloaded once and cached.

System Requirements
FeatureDownloadRAM NeededRequires
Transcribe — Flash ~120 MB 2 GB+ Any browser
Transcribe — Standard ~75 MB 2 GB+ Any browser
Transcribe — Pro ~145 MB 2 GB+ Any browser
Transcribe — Ultra ~630 MB 4 GB+ WebGPU (Chrome/Edge)
Speak — Natural (Kokoro) ~90 MB 2 GB+ Any browser
Speak — Studio (Pocket) ~125 MB 4 GB+ Any browser
Speak — Basic 0 MB Any Any browser
Smart — Lite ~388 MB 4 GB+ WebGPU (Chrome/Edge)
Smart — Balanced ~786 MB 6 GB+ WebGPU (Chrome/Edge)
Translate (OPUS-MT) ~30 MB per pair 2 GB+ Any browser

Models download once, then load from cache instantly. WebGPU is available in Chrome, Edge, and most Chromium browsers.

Transcribe (Speech to Text)

How to use

  • Pick a model — Flash is fastest, Ultra is most accurate
  • Record from your mic, upload an audio file, or paste a URL
  • Hit Transcribe and wait for the result
  • Enable timestamps to see when each word was spoken
  • Download as TXT, SRT subtitles, or copy to clipboard

Models

  • Flash (Moonshine Tiny) — Fastest, English only, great for quick notes
  • Standard (Whisper Tiny) — Good balance, 99 languages
  • Pro (Whisper Base) — Better accuracy, 99 languages, can translate to English
  • Ultra (Parakeet) — Highest accuracy, 13 languages, requires WebGPU
Speak (Text to Speech)

How to use

  • Pick an engine — Natural has 40+ voices, Basic uses your browser's built-in voices
  • Select a voice from the grid
  • Type or paste text (or upload a .txt file)
  • Click Speak to generate audio
  • Use the waveform player to listen, then download as WAV

Engines

  • Natural (Kokoro) — 40+ voices in 8 languages (EN, JP, CN, ES, FR, HI, IT, PT). Type in English and it auto-translates before speaking in the selected language
  • Studio (Pocket TTS) — English only but supports voice cloning. Record 5–10s of audio to clone any voice
  • Basic (Web Speech) — Zero download, uses your device's built-in speech engine. Quality varies by browser

Multilingual speaking

  • When you select a non-English voice, a "Text is in" dropdown appears
  • If your text is in English but the voice is Japanese, it auto-translates before speaking
  • Supported: English, Japanese, Chinese, Spanish, French, Hindi, Italian, Portuguese
Smart (AI Tools)

How to use

  • Pick a model — Lite is fastest, Advanced gives best results
  • Type or paste text, then pick a tool: Summarise, Rewrite, Key Points, Chat, or Translate
  • Results appear below — copy them or send directly to the Speak tab
  • Chat mode lets you have a conversation with the AI

Models

  • Lite (SmolLM2) — 388 MB, fastest but basic quality. Good for simple tasks
  • Balanced (Qwen 2.5) — 786 MB, best quality. Up to 4 min first download

All Smart models require WebGPU — use Chrome or Edge for best results.

Translate

  • Uses dedicated OPUS-MT translation models (~30 MB each) — not the LLM
  • Supports: English, Spanish, French, German, Portuguese, Italian, Chinese, Japanese, Hindi
  • Click Translate once to pick languages, click again to translate
  • Fast and accurate — purpose-built for translation
Enabling WebGPU

What is WebGPU?

WebGPU lets your browser use your device's GPU for AI inference. It's needed for Ultra transcription and all Smart models. Without it, these features won't load.

Desktop

  • Chrome / Edge (v113+) — WebGPU is on by default. Just make sure your browser is up to date
  • Firefox — Not yet supported. Use Chrome or Edge instead
  • Safari — Partial support from macOS Sonoma. May not work for all models

Android

  • Chrome 121+ — WebGPU is on by default on supported devices
  • If not working: open chrome://flags → search WebGPU → set to Enabled → relaunch Chrome
  • Needs a device with a modern GPU (most phones from 2020 onwards)

iPhone / iPad

  • Safari on iOS 18+ has experimental WebGPU support
  • Go to Settings → Safari → Advanced → Feature Flags → enable WebGPU
  • May still struggle with larger models — stick with Flash + Natural + Lite on iOS

Quick check

Not sure if it's working? Features that need WebGPU will show an error if it's not available. Flash (transcribe), Natural (speak), and Translate all work fine without WebGPU.

Tips
  • First load is slow — models download once then load from browser cache instantly next time
  • Phone users — stick with Flash (STT), Natural (TTS), and Lite (Smart) for best performance
  • Desktop users — Advanced + Ultra give the best quality if you have 8 GB+ RAM and Chrome/Edge
  • Privacy — everything runs on your device. Audio, text, and models never leave your browser
  • Offline — once models are cached, most features work without internet
  • Upload text — use the paperclip button to load .txt files into Speak or Smart
  • Audio files — try to keep files under 30 minutes for best results. Longer files may be slow or run out of memory
  • Switching models — if you change to a different model and it gets stuck, refresh the page. AI models use a lot of memory and a fresh page clears it
Contact

Questions, feedback, or business enquiries: contact@echolab.site