Skip to content
AldeaCode Logo
CSV to JSON / pandas Format 100% local

Convert CSV to JSON in pandas: read_csv, to_json, and orient parameter

pandas reads almost any CSV and writes almost any JSON in two function calls. The catch is the orient parameter: pick the wrong one and downstream consumers get the right data in a shape they cannot use.

read_csv to DataFrame to to_json is the whole pipeline

The minimum viable conversion is three lines:

import pandas as pd
df = pd.read_csv("input.csv")
df.to_json("output.json", orient="records", indent=2)

read_csv is permissive: it sniffs delimiters, infers dtypes, parses dates if you tell it which columns. to_json is the inverse for JSON. The interesting parameter is orient, which controls the JSON shape, not the data.

orient: records vs split vs index vs table

Five values matter in practice.

orient="records" produces a JSON array of objects, one per row. This is the lingua franca for HTTP APIs and most JS code. Use it as your default unless you have a reason not to.

orient="index" produces an object keyed by the row index, with each value being the row as an object. Useful when the index is a meaningful key (a username, a SKU) and consumers need O(1) lookup without scanning.

orient="split" produces {"columns": [...], "index": [...], "data": [[...]]}. Compact, round-trips perfectly through pandas, awful for non-pandas consumers.

orient="table" emits a JSON Table Schema document with type metadata for every column. Useful if downstream is also pandas or a strict schema-validating consumer; verbose and rarely what you want elsewhere.

orient="values" produces a 2D array with no column names. Smallest payload, requires the consumer to know the column order out of band. Use it for matrices, never for record-style data.

force_ascii and date_format are the gotchas

to_json defaults to force_ascii=True, which escapes every non-ASCII character to \u00e9 form. The result is valid JSON, but bigger and harder to read. Pass force_ascii=False for any modern UTF-8 pipeline:

df.to_json("output.json", orient="records", force_ascii=False, indent=2)

date_format="iso" is also worth setting explicitly. The default is epoch which emits UNIX millisecond integers and confuses every JSON consumer that expects strings. ISO 8601 strings round-trip through every JSON parser and database, including JavaScript's new Date(s).

Nested JSON in and out: json_normalize and chunksize

If your input is JSON with nested objects and you want a flat CSV, the inverse path is pd.json_normalize:

records = [{"id": 1, "user": {"name": "Ada", "city": "Madrid"}}]
df = pd.json_normalize(records)
# columns: id, user.name, user.city
df.to_csv("flat.csv", index=False)

For files too large to fit in memory, read_csv accepts chunksize. It returns an iterator of DataFrames, each with up to N rows:

with open("big.json", "w") as f:
    f.write("[")
    first = True
    for chunk in pd.read_csv("big.csv", chunksize=10_000):
        for row in chunk.to_dict(orient="records"):
            if not first: f.write(",")
            json.dump(row, f, ensure_ascii=False)
            first = False
    f.write("]")

This emits a streaming JSON array without ever loading the full file. It is uglier than to_json but it scales to gigabytes.

Working example

python
import json
import pandas as pd

# 1. Plain CSV to JSON, records orient, UTF-8 preserved
df = pd.read_csv("input.csv", parse_dates=["created_at"])
df.to_json(
    "output.json",
    orient="records",
    indent=2,
    force_ascii=False,
    date_format="iso",
)

# 2. Index orient when the index is a meaningful key
df_indexed = df.set_index("user_id")
df_indexed.to_json("by_user.json", orient="index", force_ascii=False)

# 3. Streaming for files too big for memory
with open("big.json", "w", encoding="utf-8") as f:
    f.write("[")
    first = True
    for chunk in pd.read_csv("big.csv", chunksize=10_000):
        for row in chunk.to_dict(orient="records"):
            if not first:
                f.write(",")
            json.dump(row, f, ensure_ascii=False, default=str)
            first = False
    f.write("]")

Just need the result?

When you have a small CSV in your downloads folder and you just want to peek at it as JSON, opening a Jupyter kernel is overkill. The browser-based CSV to JSON converter at aldeacode.com runs entirely on your machine, picks the right orient, preserves UTF-8, and gives you the result in a paste box you can copy out.

Open CSV ↔ JSON Converter →

Frequently asked questions

Why are my dates appearing as long integers in the JSON output?

to_json defaults to date_format='epoch' for any datetime column, which emits UNIX milliseconds. Pass date_format='iso' to get ISO 8601 strings instead. Make sure read_csv parsed the column with parse_dates first, otherwise the column is just a string and never goes through the date formatter.

How do I keep accented characters readable instead of \u escapes?

Pass force_ascii=False to to_json. The function defaults to True for compatibility with very old consumers. Modern pipelines should set False and write the file as UTF-8, which pandas does by default.

What is the right orient for sending data to a REST API?

records, almost always. It produces a JSON array of objects, the canonical shape every REST API and JS frontend expects. Use index only when the index column is a meaningful identifier consumers will look up by.