- How to Validate Your JSON Using JSON Schema
- What is JSON Schema?
- How to validate our JSON?
- Why should I use JSON Schema?
- Simple JSON Schema
- Simple Array Schema
- More complex functionality
- Required properties
- Dependent required
- One of / Any of
- Summary
- How to Use JSON Schema to Validate JSON Documents in Python
- What is JSON schema?
- object¶
- Properties¶
- Pattern Properties¶
How to Validate Your JSON Using JSON Schema
Imagine the following scenario: you and your teammate are working on a new feature. Your part is to create a JSON with some results and send it to your teammate. Her part is to take this JSON, parse it and save it in the database. You verbally agreed on what the keys and types should be and each one of you implemented their part. Sounds legit, and it will indeed work if the JSON structure is simple. But one day you had a bug and sent the wrong key. You learned your lesson and decided to create an API and document it in your team’s favorite documentation platform. Now you can both take a look at this API to make sure you implemented it correctly.
But is it enough? Let’s say you indeed implement it correctly. Let’s say another teammate made a change, now it returns an array of numbers instead of a single number. Your teammate is not aware of your API, and everything breaks.
What if you could validate your JSON directly in the code before sending it and before parsing it? That is what we have JSON schema for!
In this post, I will introduce JSON Schema, why it is so powerful and how can we use it in different scenarios.
What is JSON Schema?
JSON Schema is a JSON-based format for defining the structure of JSON data. It provides a contract for what JSON data is required for a given application and how to interact with it. It can be used for validation, documentation, hyperlink navigation, and interaction control of JSON data.
The schema can be defined in a JSON file, and be loaded into your code or it can be directly created in the code.
How to validate our JSON?
validate(instance=your_json, schema=schema)
from jsonschema import validate
>>> # A sample schema, like what we'd get from json.load()
>>> schema = . "type" : "object",
. "properties" : . "price" : ,
. "name" : ,
. >,
. >
>>> # If no exception is raised by validate(), the instance is valid.
>>> validate(instance=, schema=schema)
>>> validate(
. instance=, schema=schema,
. )
Traceback (most recent call last):
.
ValidationError: 'Invalid' is not of type 'number'
Why should I use JSON Schema?
Each JSON object has a basic structure of key-value. The key is a string and the value can be of any type — number, string, array, JSON, etc.
In some cases, the value can be of only a specific type, and in other cases, the value is more flexible. Some keys in our JSON are required, and some of them are optional. There are more complicated scenarios. For example, if we got a certain key, then a second key must appear. The value of one key can be dependent on a second key value.
All those scenarios and many more can be tested and validated locally using JSON Schema. By using it, you can validate your own JSON, and make sure it meets the API requirements before integrating with other services.
Simple JSON Schema
In this example, our JSON contains information about dogs.
"breed": "golden retriever",
"age": 5,
"weight": 13.5,
"name": "Luke"
>
Let’s take a closer look at this JSON’s properties and the requirements we want to enforce on each one:
- Breed —we want to represent only three breeds: golden retrievers, Belgian Malinois, and Border Collie. We would like to validate that case.
- Age — we want the age to be rounded to years, so our value will be represented as an integer. In this example, we also want to limit the maximum age to 15.
- Weight — can be any positive number, int or float.
- Name — always a string. Can be any string.
"type": "object",
"properties":
"breed": " golden retrievers",
"Belgian Malinois",
"Border Collie"
]
>,
"age": ,
"weight": ,
"name":
>
>
This way, only age values between 0 and 15 can be added, no negative weight, and only the three specific breeds.
Simple Array Schema
We can also validate array values.
For example, we want an array with the following properties: between 2 to 5 items, unique values, strings only.
['a','b','c'] "type": "array",
"items": ,
"minItems": 2,
"maxItems": 5,
"uniqueItems": true
>
More complex functionality
Required properties
Some of the properties are must-haves and we would like to raise an error if they are missing.
You can add the required keyword.
"type": "object",
"properties":
"breed": ,
"age":
>
"required":["breed"]
>
In this case, an error will be raised when the “breed” property is missing. Other properties like “age” remain optional.
Dependent required
The dependentRequired keyword conditionally requires certain properties to be present if a given property is present in an object.
"type": "object",
"properties": "name": < "type": "string" >,
"credit_card": < "type": "number" >,
"billing_address": < "type": "string" >
>,
"required": ["name"],
"dependentRequired": "credit_card": ["billing_address"]
>
>
In this case, if the “credit_card” property appears, then “billing_address” is required.
One of / Any of
Until now, each property is of only one type. What if our property can be of several different types?
Example 1 — anyOf — To validate against anyOf the given data must be valid against any (one or more) of the given subschemas.
In this case, our data can be either a string or a number bigger or equal to 0.
Example 2 — oneOf — To validate against oneOf the given data must be valid against exactly one of the given subschemas.
In this case, the data can only be numbers and it can be either multiple of 5 or multiple of 3, but not both!
Summary
JSON Schema is a powerful tool. It enables you to validate your JSON structure and make sure it meets the required API. You can create a schema as complex and nested as you need, all you need are the requirements. You can add it to your code as an additional test or in run-time.
In this post, I introduced the basic structure and mentioned some more complex options. There is a lot to explore and use that you can read about.
I think anyone who works with JSONs as part of their work should be familiar with this package and its options. It has the potential of saving you a lot of time and easing your integration process, just by easily validating your JSON structure. I know it saved me a lot of time since I started using it.
How to Use JSON Schema to Validate JSON Documents in Python
A JSON document can contain any number of key/value pairs. The key must be a string but the value can be any supported type, such as string, number, boolean, etc. The value can even be complex types like an array or nested object. This makes the JSON document both very flexible and very unstructured. However, this makes data processing more difficult because the data team often gets data through APIs whose responses are normally in JSON format. Having a consistent data format can make the data pipelines more robust. With a uniform data input, you don’t need to worry about unexpected data types and spend too much time on data cleansing. You can thus focus more on data analysis and work more efficiently.
In this post, we will introduce how to use JSON schema to validate JSON documents. The essential concepts, as well as basic and advanced use cases, will be introduced with simple code snippets that are easy to follow.
What is JSON schema?
A JSON Schema is a JSON document defining the schema of some JSON data. Well, honestly, this explanation is pretty strange and elusive but will get much clearer once we see the code later. For now, we need to understand two points:
- A JSON schema itself is a valid JSON document with key/value pairs. Each key has a special meaning and is used to define the schema of some JSON data.
- A schema is similar to the table definition in a SQL database and defines the data types of the fields in a JSON. It also defines which fields are required and which are optional.
Let’s get started with a simple JSON schema:
This JSON schema specifies that the target JSON is an object with two properties (which are also commonly referred to as keys/fields and will be used accordingly when appropriate), and the name property is required. Let’s dive a bit deeper into each validation…
object¶
Objects are the mapping type in JSON. They map “keys” to “values”. In JSON, the “keys” must always be strings. Each of these pairs is conventionally referred to as a “property”.
In Python, «objects» are analogous to the dict type. An important difference, however, is that while Python dictionaries may use anything hashable as a key, in JSON all the keys must be strings.
Try not to be confused by the two uses of the word «object» here: Python uses the word object to mean the generic base class for everything, whereas in JSON it is used only to mean a mapping from string keys to values.
In Ruby, «objects» are analogous to the Hash type. An important difference, however, is that all keys in JSON must be strings, and therefore any non-string keys are converted over to their string representation.
Try not to be confused by the two uses of the word «object» here: Ruby uses the word Object to mean the generic base class for everything, whereas in JSON it is used only to mean a mapping from string keys to values.
"key": "value", "another_key": "another_value" >
"Sun": 1.9891e30, "Jupiter": 1.8986e27, "Saturn": 5.6846e26, "Neptune": 10.243e25, "Uranus": 8.6810e25, "Earth": 5.9736e24, "Venus": 4.8685e24, "Mars": 6.4185e23, "Mercury": 3.3022e23, "Moon": 7.349e22, "Pluto": 1.25e22 >
Using non-strings as keys is invalid JSON:
Properties¶
The properties (key-value pairs) on an object are defined using the properties keyword. The value of properties is an object, where each key is the name of a property and each value is a schema used to validate that property. Any property that doesn’t match any of the property names in the properties keyword is ignored by this keyword.
See Additional Properties and Unevaluated Properties for how to disallow properties that don’t match any of the property names in properties .
For example, let’s say we want to define a simple schema for an address made up of a number, street name and street type:
"type": "object", "properties": "number": "type": "number" >, "street_name": "type": "string" >, "street_type": "enum": ["Street", "Avenue", "Boulevard"] > > >
"number": 1600, "street_name": "Pennsylvania", "street_type": "Avenue" >
If we provide the number in the wrong type, it is invalid:
"number": "1600", "street_name": "Pennsylvania", "street_type": "Avenue" >
By default, leaving out properties is valid. See Required Properties .
"number": 1600, "street_name": "Pennsylvania" >
By extension, even an empty object is valid:
By default, providing additional properties is valid:
"number": 1600, "street_name": "Pennsylvania", "street_type": "Avenue", "direction": "NW" >
Pattern Properties¶
Sometimes you want to say that, given a particular kind of property name, the value should match a particular schema. That’s where patternProperties comes in: it maps regular expressions to schemas. If a property name matches the given regular expression, the property value must validate against the corresponding schema.
Regular expressions are not anchored. This means that when defining the regular expressions for patternProperties , it’s important to note that the expression may match anywhere within the property name. For example, the regular expression «p» will match any property name with a p in it, such as «apple» , not just a property whose name is simply «p» . It’s therefore usually less confusing to surround the regular expression in ^. $ , for example, «^p$» .
In this example, any properties whose names start with the prefix S_ must be strings, and any with the prefix I_ must be integers. Any properties that do not match either regular expression are ignored.
"type": "object", "patternProperties": "^S_": "type": "string" >, "^I_": "type": "integer" > > >