How to convert pandas DataFrame into JSON in Python?
Data Analysis is an extremely important tool in today’s world. A key aspect of Data Analytics is an organized representation of data. There are numerous data structures in computer science to achieve this task. In this article, we talk about two such data structures viz. pandas DataFrames and JSON . Further, we see how to convert DataFrames to JSON format.
Pandas DataFrames are tabular representations of data where columns represent the various data points in single data entry and each row is unique data entry. Whereas JSON is a text written in JavaScript Object notations.
Note: For more information, refer to Python | Pandas DataFrame
Convert pandas DataFrame into JSON
To convert pandas DataFrames to JSON format we use the function DataFrame.to_json() from the pandas library in Python. There are multiple customizations available in the to_json function to achieve the desired formats of JSON. Let’s look at the parameters accepted by the functions and then explore the customization
Parameter | Value | Use |
---|---|---|
path_or_buf | string or filename, optional | File path or object. If not specified, the result is returned as a string. |
orient | ‘split’, ‘records’, ‘index’, ‘columns’, ‘values’, ‘table’, default=’index’ | Indication of expected JSON string format. |
date_format | None, ‘epoch’, ‘iso’, default=’epoch’ | Type of date conversion. ‘epoch’ = epoch milliseconds, ‘iso’ = ISO8601. The default depends on the orient. For orient=’table’, the default is ‘iso’. For all other orients, the default is ‘epoch’. |
double_precision | integer value, default=10 | The number of decimal places to use when encoding floating point values. |
force_ascii | boolean value, default=True | Force encoded string to be ASCII. |
date_unit | ‘s’, ‘ms’, ‘us’, ‘ns’, default=’ms’ | The time unit to encode to, governs timestamp and ISO8601 precision. The values represent second, millisecond, microsecond, and nanosecond respectively. |
default_handler | callable function | Handler to call if object cannot otherwise be converted to a suitable format for JSON. Should receive a single argument which is the object to convert and return a serializable object. |
lines | boolean value, default=False | If ‘orient’ is ‘records’ write out line delimited json format. Will throw ValueError if incorrect ‘orient’ since others are not list like. |
compression | ‘infer’, ‘gzip’, ‘bz2’, ‘zip’, ‘xz’, None, default=’infer’ | A string representing the compression to use in the output file, only used when the first argument is a filename. By default, the compression is inferred from the filename. |
index | boolean value, default=True | Whether to include the index values in the JSON string. Not including the index (index=False) is only supported when orient is ‘split’ or ‘table’. |
indent | integer value | Length of whitespace used to indent each record. Optional argument need not be mentioned. |
We now look at a few examples to understand the usage of the function DataFrame.to_json.
Example 1: Basic usage
Saved searches
Use saved searches to filter your results more quickly
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automatically convert Records to json #551
Automatically convert Records to json #551
Comments
I’ve already checked the following issues, but none of them work for my case
#17
#263
Basically, I’m fetching several records for an API that should return a JSON response. The code I’m using is the following
async with db_pool.acquire() as conn: async with conn.transaction(): res = await conn.fetch(query, param) return res
Even by adding res = [dict(r.items()) for r in res] the response is not correct for javascript that has to fetch from the client side the JSON data.
How can I correctly transform the resulting list of records into a valid JSON? I haven’t found anything in the docs.
No, I’m not using sqlalchemy or similars. They add too much overhead.
The text was updated successfully, but these errors were encountered:
With a play dump i get TypeError: Object of type datetime is not JSON serializable .
How well does this approach perform for huge results?
Ok, I managed to overcome the issue, though it’s not optimal. It would be an interesting feature to directly have JSON result instead of a list of records which have to be serialized into JSON
I would recommend server-side JSON serialization using json_agg and various json_build_* functions and then fetch that with fetchval() . The reference is here: https://www.postgresql.org/docs/current/functions-json.html
But I don’t have JSON values in my database. I use JSON as a mean to transfer data from my api to the frontend. That’s why I would like to directly return the result of the query, limiting the overhead of converting the list of records into a list of dictionaries and then returning it.
The JSON support is not limited to JSON values, you can serialize arbitrary data to JSON in Postgres.
But then also the asyncpg result is a json? Or will it still be incapsulated into Record instances?
If you use json_agg() in the query and then connection.fetchval() you’ll get the JSON result directly
If you use json_agg() in the query and then connection.fetchval() you’ll get the JSON result directly
For a query shaped like the following:
SELECT json_build_object( 'id', u.id, 'first_name', u.first_name, 'last_name', u.last_name, 'email', u.email, 'date_joined', u.date_joined ) AS user, json_build_object( 'id', o.id, 'name', o.name, 'credit_card_id', o.credit_card_id ) AS organization FROM iam_user u LEFT JOIN organization o ON u.email = o.owner_id WHERE u.email = :email;
. using fetchval doesn’t work well since I have a multi-keyed JSON object and it will default to just returning the user data via column=0 .
However using fetchrow it seems like I incur an additional serialization in Python because I have to do dict(conn.fetchrow(query)) .
I’d really love to avoid an extra serialization in Python since my query returns serialized JSON — any suggestions @elprans? cc @lsabi in case you were doing something similar (unclear from your original question though).