This Blog Has Fully Offline Search

No server. No API calls. No cloud search subscription. Just a JSON file, a JavaScript library, and full-text search that works completely offline. Here’s the whole story.

The Problem with Search on a Static Site

Static sites like this one — generated by Jekyll and hosted on GitHub Pages — are fast, cheap, and simple to maintain. But they don’t run server-side code. There’s no database to query, no backend to handle a search request.

The typical solutions aren’t great:

Approach	The catch
Algolia / Elasticsearch	External service, API keys, pricing tiers, vendor lock-in
Google Custom Search	Embeds Google’s search widget (and ads) into your page
Client-side scan	Downloads every post at search time — slow and expensive
No search at all	Fine until you have more than a dozen posts

What I actually wanted: fast, private, offline-capable search with no external dependencies. So I built it.

The Stack: Orama + a Pre-built Index

The search on this site is powered by Orama, an open-source full-text search engine written entirely in JavaScript. It runs in the browser, with no server required.

The key insight is that the index is pre-built at deploy time, not at search time. When you open the search page, your browser downloads one JSON file (search-index.json) containing the entire pre-processed search index. After that, every query runs instantly — no network requests, no latency, no server.

Building the Index

A Node.js script (scripts/build-search-index.mjs) runs before each deployment. It:

Reads every .md file in blog/_posts/
Parses the frontmatter using gray-matter to extract the title, tags, and date
Strips Markdown syntax from the post body — removes code blocks, HTML tags, link syntax, heading markers, emphasis markers — leaving clean plain text for indexing
Truncates content to 2,000 characters (enough for relevance, small enough to keep the index manageable)
Derives the URL from the filename (YYYY-MM-DD-slug.md → /blog/YYYY/MM/DD/slug/)
Inserts each post into an Orama database with this schema:

javascript

{
  title: 'string',
  url: 'string',
  content: 'string',
  tags: 'string[]',
  date: 'string',
}

Serialises the database with Orama’s save() function and writes it to search-index.json

The whole build takes under a second for ~40 posts.

Loading and Querying in the Browser

The search page uses Orama as an ES module (bundled and minified with esbuild):

javascript

import { create, load, search } from '/scripts/orama.js';

On page load:

Fetch search-index.json
Create a new Orama database with the same schema
Call load(db, rawIndex) to restore the pre-built index in memory
Enable the search input

From that point on, every keystroke triggers a local search:

javascript

const results = await search(db, {
  term,
  properties: ['title', 'content', 'tags'],
  tolerance: 1,  // fuzzy match: allows 1 typo
  limit: 20,
});

Input is debounced by 150 ms so it doesn’t fire on every individual keystroke.

🔍 Try it: Type serach (a deliberate typo) into the search box. The tolerance: 1 setting means Orama treats one character difference as a match — so typos don’t break your results.

Security: No XSS, No Open Redirects

Two things to get right when rendering user-controlled content in a search result:

HTML escaping. Every piece of text from the index (title, tags) is passed through escapeHtml() before being inserted into the DOM as a string. This prevents any content in a post title from being interpreted as HTML.

URL validation. The URL from the index is validated against /^\/[^"<>]*$/ before being used as an href. Only safe relative paths (starting with /, containing no quotes or angle brackets) are accepted. Anything else falls back to #.

Both checks are there because the data comes from a JSON file that could theoretically be tampered with in a supply chain scenario.

The Trade-offs

This approach works beautifully for a blog of this size, but it has real constraints worth knowing:

	This approach	Server-side search
Latency after index load	~0 ms	50–300 ms (network round-trip)
Initial load	~700 KB JSON	Nothing
Works offline	✅ Yes	❌ No
Privacy	✅ Queries never leave your browser	❌ Every query hits a server
Scales to 10,000+ posts	❌ Index would be too large	✅ Yes
Hosting cost	Free	Depends on the service

For a personal blog with dozens to a few hundred posts, the pre-built JSON approach is nearly perfect. The 700 KB index downloads once and is cached by the browser — subsequent visits to the search page are instant.

Why Orama?

There are a handful of client-side search libraries: Lunr.js, FlexSearch, MiniSearch, Fuse.js. I went with Orama because:

It’s actively maintained with a clean modern API
It handles string[] fields natively (useful for tags)
Built-in fuzzy matching with configurable tolerance
The serialise/restore cycle (save + load) is first-class — not an afterthought
ES module output, which means esbuild can tree-shake and bundle it cleanly

The bundled, minified Orama build comes to about 76 KB. That’s the only JavaScript file the search page adds beyond what the site already loads.

Keeping the Index Up to Date — Automatically

You could rebuild the index by hand every time you publish a post, but that’s the kind of friction that turns into a bug (forgotten rebuild → stale search results). So the whole thing is automated with GitHub Actions.

A workflow file (.github/workflows/build-search-index.yml) watches for pushes to main that touch any of the relevant paths:

yaml

on:
  push:
    branches: [main, master]
    paths:
      - 'blog/_posts/**'
      - 'scripts/build-search-index.mjs'
      - 'package.json'
      - 'package-lock.json'

The path filter means the workflow only runs when something that would actually change the index has been modified — adding a post, editing the build script, or updating dependencies. Updating a CSS file or changing a layout doesn’t trigger it.

When it does run, the steps are straightforward:

Checkout the repository
Set up Node.js 20 with npm caching
npm ci — install exact versions from package-lock.json
npm run build — this rebuilds both the Orama browser bundle (scripts/orama.js) and the search index (search-index.json)
Commit back the changed files — if either file changed, the bot commits them with the message chore: update Orama bundle and search index [skip ci]

The [skip ci] tag in the commit message tells GitHub Actions not to re-trigger workflows on that commit — preventing an infinite loop where the bot’s own commit kicks off another build.

The whole pipeline takes about 30 seconds. By the time GitHub Pages has rebuilt the site and the new post is live, the search index is already up to date — no manual step required.

💡 Want the same setup? The full source is on GitHub. The key files are scripts/build-search-index.mjs, .github/workflows/build-search-index.yml, search/index.html, and package.json. Clone the repo, swap out the posts, and the automation handles the rest.

Static doesn’t have to mean dumb. A pre-built index, a small JavaScript library, a GitHub Actions workflow, and a bit of careful wiring is all it takes to add fast, private, fully offline search to any file-based site — with zero ongoing maintenance.