The anatomy of .osm.pbf exports
PBF format of Open Street Maps has been developed as an modern alternative to XML format. You can access it programmatically at least 5x faster than the old extension, and it takes far less disk space as well. In following blog series, I’m trying to describe not only the format of this file but also the anatomy of underlying OSM data, and quirks you’ll encounter when dealing with it.
Reading .osm.pbf files
The low-level encoding of .osm.pbf is Google Protocol Buffers (protobuf). It means you can read it with any library that supports reading protobuf, but most likely you can find special purpose libraries in your favourite language.
In my case I’m using https://github.com/thomersch/gosmparse as it is written in go-lang that has close ties with protobuf and has steraming support.
Data format
OSM PBF files consist of one or more Blocks that can contain any number of three kinds of entities:
- Nodes - in short things that have specific coordinates, for example centre of cities or parks, landmarks, points of streets.
Nodes have: ID (int64), Lat and Lng coordinates (both float64), and number key-value tags describing them (string => string mapping). - Ways - ordered lists of Nodes, for example streets or parts of borders
Ways have: ID (int64), List of IDs of Nodes they consist of (int64) and list of tags that describe these ways, just like in Nodes. - Relations - ordered lists of Nodes, Ways, or other Relations. For example they can collect Ways into whole borders, streets into bus routes, etc.
Relations have: ID (int64), List of Members and list of Tags. Members consist of ID (int64), whether it is node, way, or relation (int), and role (string) that describes it’s purpose (e.g. “outer” for outer boundary).
Quirks
You cannot count that nodes will be before ways, or ways will be before relations while reading .osm.pbf file (although it is usually the case).
The consequence of it is that for example you cannot count that Ways you’re reading are referencing Nodes that you will read only later in the same files.
You cannot count that Ways will reference Node IDs that exist in the same file (or that Relations reference entities that exist in the same file).
This is because exports often consist only small part of whole planet and sometimes only part of Relation or Way members are included in the file.
What’s next
Hopefully this explains basic concept of .osm.pbf exports. What are the quirks you’ve encountered when reading .osm.pbf data?
In next blogpost I’ll describe how cities are stored in .osm.pbf files. Please subscribe for updates.