CSV to JSON with jq: @csv, slurp mode, headers, and quoting
jq is the right hammer for JSON in the shell. It can convert simple CSV to JSON in one pipeline, but RFC 4180 quoting (commas inside quoted fields) is genuinely beyond what plain jq can parse. Knowing where the line is keeps your scripts honest.
The basic CSV to JSON pipeline
For comma-separated input with no quoted commas inside fields, jq plus a header-row trick is enough:
jq -R -s '
split("\n")
| map(select(length > 0))
| (.[0] | split(",")) as $headers
| .[1:]
| map(split(",") | [$headers, .] | transpose | map({(.[0]): .[1]}) | add)
' data.csv
-R reads raw text instead of trying to parse JSON. -s slurps the whole file into one big string so we can split it on newlines. The transpose-then-map trick zips headers with each row into an object.
This works perfectly for IDs, names, numbers as strings. It breaks the moment a field contains a literal comma, which is most real CSV in the wild.
Why plain jq cannot parse RFC 4180 fully
RFC 4180 quoting (a field wrapped in double quotes that itself contains commas, newlines, or doubled quotes) requires a context-sensitive parser. split(",") is context-free: it does not know whether the next comma is inside quotes.
You can hack a regex split with negative lookahead, but jq's regex engine is Oniguruma and does support lookbehind, the result is fragile and slow. The honest answer is: do not parse RFC 4180 CSV with plain jq.
Two clean ways out:
1. Pre-clean the CSV with csvkit (csvjson data.csv) and pipe into jq for the JSON-side processing. Best for data with quoting.
2. Use a stricter delimiter that does not appear in your data (tab, pipe) and split on that. Practical for export pipelines you control.
JSON to CSV with @csv is the easy direction
Going the other way is straightforward because jq's @csv filter handles quoting itself:
jq -r '
(map(keys) | add | unique) as $cols
| $cols, (.[] | [.[ $cols[] ]])
| @csv
' data.json
@csv quotes any field that contains a comma, a newline, or a quote, and doubles internal quotes per RFC 4180. The output is safe to feed to Excel, Postgres COPY, or any modern CSV consumer.
The first line emits the header row built from the union of all keys; the rest projects each object in the same column order. If your objects have different shapes, this is the cleanest header strategy.
When jq is wrong and csvkit is right
If you find yourself escaping commas, doubling backslashes, or writing a 40-line jq program to handle quoting edge cases, stop. Use csvkit:
```bash pip install csvkit
# CSV to JSON, RFC 4180 aware, header inference, type detection csvjson --no-inference data.csv > data.json
# JSON to CSV, with column ordering in2csv data.json > data.csv ```
csvkit is built on Python's stdlib csv, which is RFC 4180 compliant. The pipeline is: csvkit for the CSV side, jq for the JSON side, both in the same shell pipeline. That division of labor handles every edge case you will hit in real pipelines without writing parsers.
Working example
bash#!/usr/bin/env bash
# CSV with simple, unquoted fields to JSON
set -euo pipefail
jq -R -s '
split("\n")
| map(select(length > 0))
| (.[0] | split(",")) as $headers
| .[1:]
| map(split(",")
| [$headers, .] | transpose
| map({(.[0]): .[1]}) | add)
' data.csv > data.json
# For RFC 4180 CSV with quoted commas, prefer csvkit:
# csvjson data.csv > data.json Just need the result?
When you have a one-off CSV to convert and writing the jq pipeline takes longer than the conversion itself, paste the CSV into the browser-based CSV to JSON converter. Quoting, BOM, and header inference are handled, the JSON drops out, you copy it into the next pipeline. Local, no upload, RFC 4180 aware.
Open CSV ↔ JSON Converter →Frequently asked questions
Can jq read a TSV file?
Yes. Replace split(",") with split("\t") in the pipeline above. Tabs are the delimiter unlikely to appear in your data, so the parsing trick survives RFC 4180 cases that comma cannot.
Does jq support reading multiple CSV files at once?
jq itself reads one stream. Concatenate files with cat first, but make sure only the first has a header row, or strip headers from the rest with tail before piping in. Otherwise the headers become rows.
Why does my numeric column become strings in JSON?
Because split returns strings. Cast in jq with tonumber inside the map step, or pass the result through csvkit which infers types. Forcing strings is sometimes correct (zip codes with leading zeros), so the default is conservative.