Push-to-talk dictation powered by NVIDIA's Parakeet TDT model — running entirely on your own hardware. No cloud, no subscription, no surveillance.
Spin up the inference server once, then use the desktop client to dictate anywhere — or plug the server into any OpenAI-compatible workflow you already have.
Python · FastAPI · ONNX Runtime
Runs on localhost:5092
TypeScript · React · TanStack
Push-to-talk client app
Open WebUI, custom scripts,
or your own app
Both projects are MIT licensed, free to use, and built to work together out of the box.
A high-performance transcription server built around NVIDIA's Parakeet TDT 0.6B v3 model via ONNX Runtime — screaming fast on CPU, with full GPU support when you want it.
A sleek local-first dictation app that turns your voice into clipboard text in an instant. Hold Space to record — release to get your transcript.
Benchmarked on LibriSpeech test-clean against professionally verified ground truth transcriptions.
All three quantization variants — INT8, FP16, FP32 — reach identical 97.84% accuracy. Choosing INT8 gets you maximum speed with nothing left on the table.
Start the server first, then launch the client and point it at localhost:5092.
# Clone and start (CPU) git clone https://github.com/\ dustinwloring1988/parakeet-server cd parakeet-server docker compose up parakeet-cpu -d # Server ready at http://localhost:5092 # Swagger docs at /docs
conda create -n parakeet python=3.10 conda activate parakeet git clone https://github.com/\ dustinwloring1988/parakeet-server cd parakeet-server pip install -r requirements.txt python server.py
git clone https://github.com/\ dustinwloring1988/parakeet cd parakeet bun install bun run dev # Open http://localhost:3000 # Set server URL in ⚙️ Settings
git clone https://github.com/\ dustinwloring1988/parakeet cd parakeet pnpm install # or npm install pnpm dev # or npm run dev # Open http://localhost:3000
http://localhost:5092, choose your model (parakeet-tdt-0.6b-v3 is the default), and save. Then hold Space to speak.