System Requirements

Feature	Download	RAM Needed	Requires
Transcribe — Flash	~120 MB	2 GB+	Any browser
Transcribe — Standard	~75 MB	2 GB+	Any browser
Transcribe — Pro	~145 MB	2 GB+	Any browser
Transcribe — Ultra	~630 MB	4 GB+	WebGPU (Chrome/Edge)
Speak — Natural (Kokoro)	~90 MB	2 GB+	Any browser
Speak — Studio (Pocket)	~125 MB	4 GB+	Any browser
Speak — Basic	0 MB	Any	Any browser
Smart — Lite	~388 MB	4 GB+	WebGPU (Chrome/Edge)
Smart — Balanced	~786 MB	6 GB+	WebGPU (Chrome/Edge)
Translate (OPUS-MT)	~30 MB per pair	2 GB+	Any browser

Models download once, then load from cache instantly. WebGPU is available in Chrome, Edge, and most Chromium browsers.

Transcribe (Speech to Text)

How to use

Pick a model — Flash is fastest, Ultra is most accurate
Record from your mic, upload an audio file, or paste a URL
Hit Transcribe and wait for the result
Enable timestamps to see when each word was spoken
Download as TXT, SRT subtitles, or copy to clipboard

Models

Flash (Moonshine Tiny) — Fastest, English only, great for quick notes
Standard (Whisper Tiny) — Good balance, 99 languages
Pro (Whisper Base) — Better accuracy, 99 languages, can translate to English
Ultra (Parakeet) — Highest accuracy, 13 languages, requires WebGPU

Speak (Text to Speech)

How to use

Pick an engine — Natural has 40+ voices, Basic uses your browser's built-in voices
Select a voice from the grid
Type or paste text (or upload a .txt file)
Click Speak to generate audio
Use the waveform player to listen, then download as WAV

Engines

Natural (Kokoro) — 40+ voices in 8 languages (EN, JP, CN, ES, FR, HI, IT, PT). Type in English and it auto-translates before speaking in the selected language
Studio (Pocket TTS) — English only but supports voice cloning. Record 5–10s of audio to clone any voice
Basic (Web Speech) — Zero download, uses your device's built-in speech engine. Quality varies by browser

Multilingual speaking

When you select a non-English voice, a "Text is in" dropdown appears
If your text is in English but the voice is Japanese, it auto-translates before speaking
Supported: English, Japanese, Chinese, Spanish, French, Hindi, Italian, Portuguese

Smart (AI Tools)

How to use

Pick a model — Lite is fastest, Advanced gives best results
Type or paste text, then pick a tool: Summarise, Rewrite, Key Points, Chat, or Translate
Results appear below — copy them or send directly to the Speak tab
Chat mode lets you have a conversation with the AI

Models

Lite (SmolLM2) — 388 MB, fastest but basic quality. Good for simple tasks
Balanced (Qwen 2.5) — 786 MB, best quality. Up to 4 min first download

All Smart models require WebGPU — use Chrome or Edge for best results.

Translate

Uses dedicated OPUS-MT translation models (~30 MB each) — not the LLM
Supports: English, Spanish, French, German, Portuguese, Italian, Chinese, Japanese, Hindi
Click Translate once to pick languages, click again to translate
Fast and accurate — purpose-built for translation

Enabling WebGPU

What is WebGPU?

WebGPU lets your browser use your device's GPU for AI inference. It's needed for Ultra transcription and all Smart models. Without it, these features won't load.

Desktop

Chrome / Edge (v113+) — WebGPU is on by default. Just make sure your browser is up to date
Firefox — Not yet supported. Use Chrome or Edge instead
Safari — Partial support from macOS Sonoma. May not work for all models

Android

Chrome 121+ — WebGPU is on by default on supported devices
If not working: open chrome://flags → search WebGPU → set to Enabled → relaunch Chrome
Needs a device with a modern GPU (most phones from 2020 onwards)

iPhone / iPad

Safari on iOS 18+ has experimental WebGPU support
Go to Settings → Safari → Advanced → Feature Flags → enable WebGPU
May still struggle with larger models — stick with Flash + Natural + Lite on iOS

Quick check

Not sure if it's working? Features that need WebGPU will show an error if it's not available. Flash (transcribe), Natural (speak), and Translate all work fine without WebGPU.

Tips

First load is slow — models download once then load from browser cache instantly next time
Phone users — stick with Flash (STT), Natural (TTS), and Lite (Smart) for best performance
Desktop users — Advanced + Ultra give the best quality if you have 8 GB+ RAM and Chrome/Edge
Privacy — everything runs on your device. Audio, text, and models never leave your browser
Offline — once models are cached, most features work without internet
Upload text — use the paperclip button to load .txt files into Speak or Smart
Audio files — try to keep files under 30 minutes for best results. Longer files may be slow or run out of memory
Switching models — if you change to a different model and it gets stuck, refresh the page. AI models use a lot of memory and a fresh page clears it

Contact

Questions, feedback, or business enquiries: contact@echolab.site

Echo Lab is a product of Sebiu Labs — tools for media professionals.

Echo Lab

Drop audio file

How to use

Models

How to use

Engines

Multilingual speaking

How to use

Models

Translate

What is WebGPU?

Desktop

Android

iPhone / iPad

Quick check