osm-tag-csv-history

Use CSV tools to see who's mapping what in OpenStreetMap.

Given a OSM history file, it produces a CSV file, where each row refers to a change (addition, removal or modification) to a tag all OSM objects in an OSM data file with history.

Getting data

Planet.OpenStreetMap.org provides a “full history” file, updated every week, where you can download the latest full history file (⚠ 80+ GB! ⚠), although it's quite large.

Geofabrik provides an download service which includes full history files for lots of regions & countries. You must log into that with your OpenStreetMap account. You can also use this tool on regular, non-history, OSM data files.

Installation

If you have Rust installed, you can install it with:

cargo install osm-tag-csv-history

You can download prebuild binary released from the Github release page, (e.g. download the v0.1.0 release).

Usage

osm-tag-csv-history -i mydata.osm.pbf -o mydata.csv

Example

Many programmes can use CSV files. It's also possible to use hacky unix command line programmes to calculate who's adding fuel stations (amenity=fuel in OSM) in Ireland:

osm-tag-csv-history -i ./ireland-and-northern-ireland-internal.osh.pbf -o - --no-header | grep '^amenity,fuel,' | cut -d, -f9 | sort | uniq -c | sort -n | tail -n 20

Here can find all times someone has upgraded a building from building=yes to something else.

osm-tag-csv-history -i data.osh.pbf -o - --no-header | grep -P '^building,[^,]+,yes,' | cat -n

And with some other command line commands, we can get a list of who's doing the most to make OSM more descriptive by upgrading building=yes.

osm-tag-csv-history -i data.osh.pbf -o - --no-header | grep -P '^building,[^,]+,yes,' | xsv select 8 | sort | uniq -c | sort -n | tail -n 20

Using with osmium getid

The id column (column 4) can be used by osmium-tool to filter an OSM file by object id. This is how you get a file of all the pet shops in OSM in a file:

osm-tag-csv-history -i country-latest.osm.pbf -o - --no-header | grep '^shop,pet,' | xsv select 4 | osmium getid -i - country-latest.osm.pbf -o pets.osm.pbf -r

(For this simple case, osmiums's tag filtering is probably better)

Non-history files

This programme can run on non-history files just fine. The old_value, and old_version will be empty. This can be a way to convert OSM data into CSV format for further processing.

Using on privacy preserving files.

The Geofabrik Public Download Service provides non-history files which do not include some metadata, like usernames, uids or changesetids. This tool can run on them and just give an empty value for username, and 0 for uid & changesetid.

If you have an OSM account, you can get full metada from the internal service.

Output file format

Records are separated by a newline (\n). A header line is included by default, but it can be turned off with --no-header (or forcibly included with --header).

If any string (e.g. tag value, username) has a newline or characters like that, it will be escaped with a backslash (i.e. a newline is written as 2 characters, \ then n).

Columns

(in order)

Example

Imagine this simple file:

xml <?xml version='1.0' encoding='UTF-8'?> <osm version="0.6" generator="osmium/1.7.1"> <node id="1" version="1" timestamp="2019-01-01T00:00:00Z" lat="0.0" lon="0.0" user="Alice" uid="12" changeset="2"> <tag k="place" v="city"/> <tag k="name" v="Nice City"/> </node> <node id="1" version="2" timestamp="2019-03-01T12:30:00Z" lat="0.0" lon="0.0" user="Bob" uid="2" changeset="10"> <tag k="place" v="city"/> <tag k="name" v="Nice City"/> <tag k="population" v="1000000"/> </node> <node id="2" version="1" timestamp="2019-04-01T00:00:00Z" lat="0.0" lon="0.0" user="Alice" uid="12" changeset="20"> <tag k="amenity" v="restaurant"/> <tag k="name" v="TastyEats"/> </node> <node id="2" version="2" timestamp="2019-04-01T02:00:00Z" lat="0.0" lon="0.0" user="Alice" uid="12" changeset="21"> <tag k="amenity" v="restaurant"/> <tag k="name" v="TastyEats"/> <tag k="cuisine" v="regional"/> </node> <node id="2" version="3" timestamp="2019-04-01T03:00:00Z" lat="0.0" lon="0.0" user="Alice" uid="12" changeset="22"> <tag k="amenity" v="restaurant"/> <tag k="name" v="TastyEats"/> <tag k="cuisine" v="burger"/> </node> <node id="2" version="4" timestamp="2019-04-01T03:00:00Z" lat="1.0" lon="0.0" user="Alice" uid="12" changeset="22"> <tag k="amenity" v="restaurant"/> <tag k="name" v="TastyEats"/> <tag k="cuisine" v="burger"/> </node> <node id="3" version="1" timestamp="2019-04-01T00:00:00Z" lat="0.0" lon="0.0" user="Alice" uid="12" changeset="50"> <tag k="amenity" v="bench"/> </node> <node id="3" version="2" timestamp="2019-06-01T00:00:00Z" lat="0.0" lon="0.0" user="Alice" uid="12" changeset="100" visible="false"> </node> </osm>

NB: This programme cannot read XML files, only PBF. This file was converted to PBF with osmium cat example.osm.xml -o example.osm.pbf.

Running osm-tag-csv-history on it produces this CSV file (formatted here as a table by with csvtomd):

key | newvalue | oldvalue | id | newversion | oldversion | datetime | username | uid | changeset_id ------------|--------------|-------------|------|---------------|---------------|------------------------|------------|-------|-------------- name | Nice City | | n1 | 1 | | 2019-01-01T00:00:00Z | Alice | 12 | 2 place | city | | n1 | 1 | | 2019-01-01T00:00:00Z | Alice | 12 | 2 population | 1000000 | | n1 | 2 | 1 | 2019-03-01T12:30:00Z | Bob | 2 | 10 amenity | restaurant | | n2 | 1 | | 2019-04-01T00:00:00Z | Alice | 12 | 20 name | TastyEats | | n2 | 1 | | 2019-04-01T00:00:00Z | Alice | 12 | 20 cuisine | regional | | n2 | 2 | 1 | 2019-04-01T02:00:00Z | Alice | 12 | 21 cuisine | burger | regional | n2 | 3 | 2 | 2019-04-01T03:00:00Z | Alice | 12 | 22 amenity | bench | | n3 | 1 | | 2019-04-01T00:00:00Z | Alice | 12 | 50 amenity | | bench | n3 | 2 | 1 | 2019-06-01T00:00:00Z | Alice | 12 | 100

Some things to note:

Possible useful tools

The following other tools might be useful:

Misc

Copyright 2020, GNU Affero General Public Licence (AGPL) v3 or later. See LICENCE.txt. Source code is on Sourcehut, and Github.

The output file should be viewed as a Derived Database of the OpenStreetMap database, and hence under the ODbL 1.0 licence, the same as the OpenStreetMap copyright