Model loaded — say something!
This tool runs a large language model entirely inside your browser tab using WebAssembly. No text you type is ever sent to a server — inference happens locally on your CPU.
Transformers.js is the official JavaScript port of Hugging Face Transformers. It uses ONNX Runtime Web to run pre-converted ONNX model files inside the browser's WebAssembly engine — no plugins or native installs required.
| Model | Size (quantized) | Notes |
|---|---|---|
| TinyLlama 1.1B Chat | ~640 MB | Instruction-tuned chat model; best results for Q&A and conversation. |
| GPT-2 | ~120 MB | Lightweight text-completion model. Fast to load; not instruction-tuned. |
WASM inference is CPU-bound and noticeably slower than GPU-accelerated APIs. Typical generation speed on a modern laptop is 1 – 10 tokens/second depending on the model and CPU. The first generation in a session may be slower as the WASM engine JIT-compiles the model graph.
All inference runs locally in your browser tab. Your messages, model weights, and any generated text never leave your device.