Text Autocomplete with a Trie

Overview

Built as the signature project for CS455 (Algorithms & Structured Programming) at SFBU, this is a text autocomplete engine backed by a Trie. The web UI accepts a prefix, queries the Trie, and returns the top suggestions ranked by frequency — typically in under a millisecond, even with a 370,000-word dictionary loaded.

The interesting part isn't the Trie itself — that's a well-known data structure. What's interesting is comparing it side-by-side with a Binary Search Tree on the same operations and seeing exactly why a Trie wins for this problem.

Why a Trie

Autocomplete is a prefix-matching problem. Given the string "app", I want to find every word in the dictionary that starts with those three characters, then rank them.

A naive approach scans the entire dictionary on every keystroke. A BST is faster but still has to traverse comparing whole words. A Trie stores one character per node, so finding all completions of "app" means walking three nodes from the root and then collecting everything in the subtree below.

The key property: lookup time depends on the length of the query, not the size of the dictionary. Searching for "app" in a 100-word dictionary takes the same number of operations as searching in a 100,000-word one. That's why every serious autocomplete system uses a Trie or one of its variants.

What's in the codebase

The repo has a few distinct pieces:

trie.py — the data structure itself. Insert, search, delete, prefix-collect, frequency boost.
dictionary_loader.py — reads the 4MB dictionary file, normalizes words, populates the Trie.
app.py — the Flask web UI you're using above. Stateful in-memory Trie shared across requests.
main.py — a CLI version of the same thing for terminal-only use.
benchmark.py — measures Trie operations against a list-based baseline.
visualizer-trie-vs-bst.html — the standalone HTML visualizer embedded below.

Everything is covered by a pytest suite — 56 tests passing across the Trie operations, dictionary loading, and edge cases like empty strings, mixed case, and unicode.

What I learned building it

The version of this project I'd write today would be different from the one I shipped. A few lessons:

Frequency ranking is a separate concern from the data structure. I baked frequency into Trie nodes, which works but couples two responsibilities. A cleaner design would keep frequencies in a sidecar dict keyed by word, letting the Trie stay a pure prefix index.
Tries are memory-hungry at the leaves. With 370k words averaging ~9 characters, the Trie has on the order of millions of nodes. A more sophisticated implementation would compress single-child chains (a radix tree) or use a DAWG.
The web UI was harder than the algorithm. I spent more time on the input debouncing, the bar chart rendering, and the live latency display than on the Trie itself. That ratio was a good reality check on what "shipping" actually means.

Text Autocomplete with a Trie

Overview

Why a Trie

What's in the codebase

What I learned building it

The autocomplete itself

Stack