Gazetteer of the World — extracted places

Reading the pages. We start from public-domain photographic scans of the book and turn the printed words back into text with OCR (“optical character recognition”), using the layout-aware Surya model.

Understanding each entry. A large language model (“LLM” — the kind of AI behind chatbots) reads each entry and pulls out structured facts: the place name, the type of place, its country, coordinates, population, and so on. We use the open Llama 3.3, with gpt-oss double-checking its work and Qwen3 repairing the cases it flags.

Reading the tables and pictures. The statistical tables and the engraved plates (town views and maps) are read by a vision model, Qwen2.5-VL.

Linking to a global index. We then try to match each place to the World Historical Gazetteer (WHG) — a free academic project that gathers historical places from many sources into one searchable index. A match gives the 1856 place a modern location, and sometimes a boundary outline. (This matching step is called reconciliation.)

Gazetteer of the World (1856) ⓘ

What you can do here

How it was made

⚠️ How much to trust this

Source & credits

Gazetteer of the World (1856)