AldeaCode Logo
Developer Text Diff Algorithms Explained: Myers, Patience, Histogram
Developer AldeaCode Architecture

Text Diff Algorithms Explained: Myers, Patience, Histogram

How text diff algorithms work: Myers, Hunt-Szymanski, patience and histogram. Which one Git uses by default and when to switch for cleaner pull requests.

What “diff” really means

A diff is a list of changes from one version of a file to another. You give it old and new, it gives you the smallest list of edits that turns one into the other.

The algorithm matters because there are many valid lists of edits, and the one a human finds readable is not always the one a computer finds shortest. Git, GitHub, your IDE, your code review tool all run a diff algorithm every time you change a file. The choice of algorithm shapes what the review looks like.

Myers: the default Git uses

The Myers algorithm has been the default in Git, in diff on every Unix system, and in most code review tools since 1986. It finds the shortest list of edits by treating the file as a sequence of lines and walking a grid that compares each old line to each new line.

Myers is fast on typical files (under 10,000 lines). It produces correct, optimal diffs in the sense of “fewest changes counted by line”.

The catch: optimal by line count is not always readable. Myers can match identical small lines (like blank lines or } characters) across very different functions, making the diff look like you moved code that you actually rewrote in place. The famous “wrong block of close braces” problem.

When you want to inspect a diff manually without a tool, paste both versions into the text diff tool on AldeaCode. It runs the diff in your browser and shows the result side by side, no upload.

Patience diff: the readable cousin

Patience diff was designed by Bram Cohen (the BitTorrent author) specifically to fix the “wrong close brace” problem in Myers.

The algorithm matches only lines that appear exactly once in both files first, then recursively diffs the chunks between those matches. The effect is that lines that are clearly the same context (function names, unique comments, distinct strings) anchor the diff, and the matching of generic lines around them is constrained.

Patience produces diffs that look more like what a human would draw. The cost is slightly slower runtime and slightly larger diffs in pathological cases.

Git supports patience diff with git diff --patience or by setting diff.algorithm = patience in config. Many teams flip it on globally and never go back.

Histogram: the modern default

Histogram diff is a refinement of patience diff that picks the rare lines as anchors more aggressively. It is the default in JGit (Eclipse) and an option in Git through --histogram or diff.algorithm = histogram.

For most code, histogram and patience produce similar results. For some pathological files (large changes with many small repeated lines), histogram produces noticeably better diffs.

Modern recommendation: turn on histogram in your global Git config. The runtime is a few percent slower than Myers on typical files, the result is more readable, and it almost never makes things worse.

Hunt-Szymanski: the ancestor

Hunt-Szymanski is from 1977, ten years before Myers. It is the algorithm that powered the original diff on Unix in the 1970s. Most modern systems abandoned it because it does not handle the worst case as gracefully as Myers.

Worth knowing because old documentation still refers to it, and because some specialized tools (genome alignment, plagiarism detection) still use derivatives of it.

The actual fail modes

A few cases where any diff algorithm produces something confusing:

Whitespace only changes. A reformat (tabs to spaces, line endings, trailing whitespace) makes every line “changed” even though the meaning did not. The fix is git diff -w to ignore whitespace, or run a formatter and commit the formatting separately from the logic.

Renamed blocks. If you cut a function from one place and paste it identical in another, line based diff sees a delete in one spot and an insert in another. Git’s git diff -M detects renames at the file level but not for blocks within a file.

Reordered lines. Swapping two adjacent lines is two changes (delete, insert) in line based diff. Word based diff or character based diff handles this better, but at much higher cost.

Long single line files. Minified JavaScript or compressed config makes the whole file look like one big line. Diff algorithms struggle. The pragmatic fix is to format the file before diffing.

Picking the algorithm

For 99 percent of code in 2026:

  • Set diff.algorithm = histogram in your Git config and forget about it.
  • Use -w when you have whitespace noise.
  • Use -M when you suspect renames.

For specialized cases:

  • Patience when histogram is not available (some old Git versions).
  • Word level diff for prose, blog posts, documentation.
  • Character level diff for very short strings where every character matters.

When you want to inspect a diff outside Git, the text diff tool, the JSON formatter for canonicalising both sides, and the find and replace for normalisation all run in your browser. The diff algorithm is one of those parts of your tooling that quietly shapes how you experience every code review. Picking a sensible default once saves a small amount of confusion every single day.

What we do

Honest sites. No shortcuts.

Real engineering, careful design. Liked the post? Let's talk about your project.

Get in touch →

You might also like

Browse all articles →