AldeaCode Logo
Developer Common CSV Problems and How to Fix Them
Developer May 1, 2026 AldeaCode Architecture

Common CSV Problems and How to Fix Them

Common CSV file problems explained: delimiters, quoting, embedded newlines, UTF-8 encoding and locale decimals. Practical fixes you can run in your browser.

Why CSV is harder than it looks

CSV is the format people reach for when they want something simple. Then they hit the first comma inside a value, and the simple format starts revealing how little it actually agrees on.

The truth is there is no single CSV. There are dozens of slightly different conventions, and any tool that reads a file produced by another tool is making assumptions that may or may not hold. The traps below are the ones that catch real analysts in real workflows, every single week.

Common CSV bugs at a glance

BugSymptomFix
Quote escapingStray quotes split rowsPick one style (doubled or backslash)
Encoding mismatchGarbled accents, replacement charsSave and read as UTF-8 with BOM
Numeric coercionLeading zeros vanish, IDs become floatsRead columns as strings
Trailing newlinePhantom empty last rowStrip blank lines on parse
Header inferenceWrong column types, off-by-onePass an explicit schema
Date timezoneDates shift by a dayStore ISO 8601 with offset

The delimiter is not always a comma

In most of Europe, the comma is the decimal separator. Excel in Spanish, French, German, Italian or Portuguese exports CSV using a semicolon by default, because using a comma would conflict with 1,5 meaning “one point five”.

The Tab Separated Values format is the same idea with tabs. Some pipelines use pipes (|) for the same reason. None of them are wrong. They are just not the same file.

When you receive a CSV and the parser puts everything in a single column, the first thing to check is the delimiter. Open the file in a plain text editor and look at the first few lines. Whatever sits between the values is your delimiter, no matter what the extension says.

Quoted fields and embedded commas

The standard rule is: if a value contains the delimiter, wrap the value in quotes. Hello, world becomes "Hello, world" in a comma separated file.

The trap is that the rule has variations. Some systems double the inner quote ("He said ""hi"""). Others escape it with a backslash ("He said \"hi\""). Excel uses the doubled version, most Unix tools accept both. A file that looks fine in one tool can become garbage in another.

When something does not parse, paste the file into the CSV to JSON converter on AldeaCode. It tries the common quoting variants and shows you which fields contain hidden commas, quotes, or newlines. The data never leaves your tab.

Embedded newlines: the silent destroyer

A multi line value (a customer note, a paragraph of feedback) can contain a real newline character inside it. The standard says you wrap that value in quotes and the newline becomes part of the field.

Most tools handle this. Some do not. Some count lines naively and end up reporting “1,000 rows” when the file actually has 800, with 200 newlines hidden inside cells. If your row count looks slightly off after every export, embedded newlines are the first suspect.

The fix is to strip newlines from the source data before exporting, or to verify that your parser handles quoted multi line fields. The text counter on AldeaCode is useful when you need a sanity check on row counts after the parse.

Encoding mismatches: the BOM and the accent

The single most common CSV bug is opening a file in the wrong encoding. UTF-8 has been the default everywhere for fifteen years, but Excel still saves CSV as Windows-1252 or similar by default on some systems.

If you open an UTF-8 file as Windows-1252, every accented letter becomes mojibake (pingüino becomes pingüino). If you do the reverse, the file fails to parse altogether.

The Byte Order Mark (BOM) is three invisible bytes that some Microsoft tools prepend to UTF-8 files. They tell the reader “this is UTF-8”. Most modern parsers handle the BOM correctly. Some parse it as part of the first column header, leaving you with a column called Name instead of Name. If your first column refuses to match by name, suspect a BOM.

Locale aware decimals and dates

1,234.56 in the United States is the same number as 1.234,56 in Spain. The same string parses as a different number depending on which locale your tool assumes.

Same with dates. 01/02/2026 is January 2nd in the US and February 1st almost everywhere else. There is no way to detect from the string which one was meant.

The robust fix is to use ISO formats inside the file: numbers without thousand separators, dates as 2026-02-01. The pragmatic fix is to ask the tool that produced the file what locale it used, and parse with that locale on purpose. Guessing is how you end up with January and February swapped in three out of five rows.

A 10 second pre flight check

Before you trust a CSV, do this:

  1. Open it in a plain text editor. Confirm the delimiter and the encoding.
  2. Count the rows by counting the line breaks, then count again with your parser. If the numbers differ, embedded newlines are hiding inside cells.
  3. Look at the first column header. If it has weird characters at the start, you have a BOM.
  4. Spot check a few rows for accented characters and decimal numbers. If anything looks wrong, the encoding or locale is wrong.

When you need to inspect a CSV without the rest of the pipeline, the CSV to JSON converter, the JSON formatter and the text counter on AldeaCode all run in your browser. The data never leaves your tab, no upload, no log.

CSV is one of those formats that works perfectly until it does not. The traps are not in the spec, they are in the gap between what your tool assumes and what the file actually contains. Ten seconds of inspection saves you an afternoon of debugging.

What we do

Honest sites. No shortcuts.

Real engineering, careful design. Liked the post? Let's talk about your project.

Get in touch →

You might also like

Browse all articles →