A tool for transpiling JSON Schema into schemas for Avro and BigQuery.
JSON Schema is primarily used to validate incoming data, but contains enough information to describe the structure of the data. The transpiler encodes the schema for use with data serialization and processing frameworks. The main use-case is to enable ingestion of JSON documents into BigQuery through an Avro intermediary.
This tool can handle many of the composite types seen in modern data processing tools that support a SQL interface such as lists, structures, key-value maps, and type-variants.
This tool is designed for generating new schemas from
mozilla-pipeline-schemas
,
the canonical source of truth for JSON schemas in the Firefox Data Platform.
cargo install --git https://github.com/acmiyaguchi/jsonschema-transpiler
``` jsonschema-transpiler 0.2.0 A tool to transpile JSON Schema into schemas for data processing
USAGE: jsonschema-transpiler [OPTIONS] [FILE]
FLAGS: -h, --help Prints help information -V, --version Prints version information
OPTIONS:
-t, --type
ARGS:
JSON Schemas can be read from stdin or from a file.
```bash
$ schema='{"type": "object", "properties": {"foo": {"type": "boolean"}}}'
$ echo $schema | jq { "type": "object", "properties": { "foo": { "type": "boolean" } } }
$ echo $schema | jsonschema-transpiler --type avro { "fields": [ { "name": "foo", "type": [ { "type": "null" }, { "type": "boolean" } ] } ], "name": "root", "type": "record" }
$ echo $schema | jsonschema-transpiler --type bigquery { "fields": [ { "mode": "NULLABLE", "name": "foo", "type": "BOOL" } ], "mode": "REQUIRED", "type": "RECORD" } ```
Contributions are welcome. The API may change significantly, but the
transformation between various source formats should remain consistent. To aid
in the development of the transpiler, tests cases are generated from a language
agnostic format under tests/resources
.
json
{
"name": "test-suite",
"tests": [
{
"name": "test-case",
"description": [
"A short description of the test case."
],
"tests": {
"avro": {...},
"bigquery": {...},
"json": {...}
}
},
...
]
}
Schemas provide a type system for data-structures. Most schema languages support a similar set of primitives. There are atomic data types like booleans, integers, and floats. These atomic data types can form compound units of structure, such as objects, arrays, and maps. The absence of a value is usually denoted by a null type. There are type modifiers, like the union of two types.
The following schemas are currently supported:
In the future, it may be possible to support schemas from similar systems like Parquet and Spark, or into various interactive data languages (IDL) like Avro IDL.