Open Source · Free Forever · MIT License

Your voice.
Your machine.

Push-to-talk dictation powered by NVIDIA's Parakeet TDT model — running entirely on your own hardware. No cloud, no subscription, no surveillance.

Get the Server Get the App

97.84% accuracy 30× faster than real-time 25 languages OpenAI-compatible API CPU-only, no GPU needed

Architecture

Two projects. One seamless stack.

Spin up the inference server once, then use the desktop client to dictate anywhere — or plug the server into any OpenAI-compatible workflow you already have.

🖥️

Parakeet Server

Python · FastAPI · ONNX Runtime
Runs on localhost:5092

audio / text

🎙️

Parakeet Flow

TypeScript · React · TanStack
Push-to-talk client app

also compatible with

🔌

Any OpenAI Client

Open WebUI, custom scripts,
or your own app

Open Source Projects

Everything you need to run it.

Both projects are MIT licensed, free to use, and built to work together out of the box.

🖥️

Python FastAPI ONNX Docker

Parakeet Server

A high-performance transcription server built around NVIDIA's Parakeet TDT 0.6B v3 model via ONNX Runtime — screaming fast on CPU, with full GPU support when you want it.

Up to 30× faster than real-time on consumer CPUs
Beats GPU-accelerated Faster Whisper on CPU alone
OpenAI-compatible API — works with any existing tooling
Silero-VAD auto-chunking for long audio files
CPU & GPU Docker images included
Built-in web UI for drag-and-drop testing
INT8 / FP16 / FP32 model variants selectable per request
Open WebUI integration out of the box

View on GitHub

🎙️

TypeScript React TanStack

Parakeet Flow

A sleek local-first dictation app that turns your voice into clipboard text in an instant. Hold Space to record — release to get your transcript.

Push-to-talk — hold Space anywhere on the page to record
Transcript lands on clipboard automatically
Full history with one-click re-copy
All audio stays on your machine, never sent to a cloud
Point it at any Parakeet TDT endpoint you control
Model selector and optional API key support
Built with Radix UI + Tailwind CSS 4
Runs with Bun, pnpm, or npm

View on GitHub

Performance

Built to be embarrassingly fast.

Benchmarked on LibriSpeech test-clean against professionally verified ground truth transcriptions.

30×

faster than real-time
on a modern CPU

97.84%

transcription accuracy
on LibriSpeech test-clean

languages with auto
language detection

Zero accuracy loss at INT8

All three quantization variants — INT8, FP16, FP32 — reach identical 97.84% accuracy. Choosing INT8 gets you maximum speed with nothing left on the table.

English Spanish French German Russian Italian Polish +18 more

Quick Start

Up and running in minutes.

Start the server first, then launch the client and point it at localhost:5092.

Docker (recommended)

Conda

# Clone and start (CPU)
git clone https://github.com/\
  dustinwloring1988/parakeet-server

cd parakeet-server
docker compose up parakeet-cpu -d

# Server ready at http://localhost:5092
# Swagger docs at /docs

conda create -n parakeet python=3.10
conda activate parakeet
git clone https://github.com/\
  dustinwloring1988/parakeet-server
cd parakeet-server
pip install -r requirements.txt
python server.py

Bun

npm / pnpm

git clone https://github.com/\
  dustinwloring1988/parakeet

cd parakeet
bun install
bun run dev

# Open http://localhost:3000
# Set server URL in ⚙️ Settings

git clone https://github.com/\
  dustinwloring1988/parakeet
cd parakeet
pnpm install  # or npm install
pnpm dev      # or npm run dev

# Open http://localhost:3000

💡 Configure the client: Click the gear icon in Parakeet Flow's top-right, enter http://localhost:5092, choose your model (parakeet-tdt-0.6b-v3 is the default), and save. Then hold Space to speak.

Your voice.Your machine.

Parakeet Server

Parakeet Flow

Any OpenAI Client

Parakeet Server

Parakeet Flow

Zero accuracy loss at INT8

Your voice.
Your machine.