What is a diff checker?

A diff checker is a tool that compares two pieces of text or two files and highlights the differences between them. It identifies lines that have been added, removed, or modified. Diff checkers are widely used in software development for code review, configuration management, document comparison, and version control workflows.

How do diff algorithms work?

Diff algorithms work by finding the Longest Common Subsequence (LCS) between two texts. The most common algorithm, Myers' diff algorithm, models the problem as finding the shortest edit script -- the minimum number of insertions and deletions needed to transform one text into another. It operates on an edit graph and uses a greedy approach to find the optimal solution in O(ND) time, where N is the total length and D is the number of differences.

What is the difference between unified diff and side-by-side diff?

Unified diff shows changes in a single column with added lines prefixed by '+' and removed lines prefixed by '-', along with context lines. Side-by-side diff shows the original text on the left and the modified text on the right, with changes highlighted in each column. Unified diff is more compact and commonly used in command-line tools and patches, while side-by-side diff is more visual and easier to read for reviewing changes.

Can I compare binary files with a diff checker?

Standard text diff checkers are designed for text-based files and cannot meaningfully compare binary files like images, PDFs, or compiled executables. For binary files, specialized diff tools exist: image diff tools overlay or highlight pixel differences, PDF diff tools compare rendered pages, and hex editors can compare raw binary data byte by byte.

What is a three-way merge and how does it relate to diffing?

A three-way merge compares two modified versions of a file against their common ancestor (base version). It uses diff algorithms to identify what each version changed relative to the base. If both versions changed different parts of the file, the merge combines the changes automatically. If both versions changed the same lines, a merge conflict occurs and must be resolved manually. Git and other version control systems use three-way merging extensively.

How do I ignore whitespace differences when comparing files?

Most diff tools provide options to ignore whitespace differences. In git diff, use the -w flag to ignore all whitespace or -b to ignore changes in the amount of whitespace. Online diff checkers typically offer a checkbox or toggle to ignore whitespace. This is useful when comparing files that may have different indentation styles, trailing spaces, or line ending formats (CRLF vs LF).

Diff Checker Guide: Compare Text Files & Code Online

1. What Is Text Diffing?

Text diffing is the process of comparing two pieces of text and identifying the differences between them. A "diff" (short for "difference") is the output of this comparison -- a structured representation of what has changed from one version to another. The concept is fundamental to software development, version control, document management, and any workflow where tracking changes matters.

The idea of automated text comparison dates back to the early 1970s. The original diff utility was written by Douglas McIlroy at Bell Labs in 1974 and became part of the Unix operating system. McIlroy's implementation was based on an algorithm by James W. Hunt and Thomas G. Szymanski for computing the longest common subsequence (LCS) of two sequences. This foundational work laid the groundwork for all modern diff tools.

At its core, a diff tool answers a simple question: "What changed between version A and version B?" The answer is expressed as a series of edit operations -- typically insertions, deletions, and modifications -- that would transform the original text into the modified text. This sequence of operations is called an "edit script."

Today, diffing is everywhere. Every time you review a pull request on GitHub, examine a commit log, compare configuration files before a deployment, or track revisions in a document, you are using diff technology. Understanding how diffing works -- and how to use diff tools effectively -- is an essential skill for developers, system administrators, technical writers, and anyone who works with text.

The applications extend beyond code. Legal professionals compare contract revisions. Translators compare source and translated documents. Data engineers compare schema definitions. Security analysts compare system configurations to detect unauthorized changes. The underlying technology is the same: algorithmic text comparison.

2. How Diff Algorithms Work

Understanding how diff algorithms work gives you insight into why certain outputs look the way they do, and helps you choose the right tool and settings for your use case. The problem of computing the "best" diff between two texts is well-studied in computer science, with several algorithms offering different tradeoffs between speed, memory usage, and output quality.

The Longest Common Subsequence (LCS) Problem

Most diff algorithms are based on the Longest Common Subsequence (LCS) problem. Given two sequences, the LCS is the longest sequence of elements that appear in both, in the same order, but not necessarily contiguously. For text comparison, each "element" is typically a line of text.

For example, given two files:

File A:          File B:
line 1           line 1
line 2           line 3
line 3           line 4
line 4           line 5

The LCS is ["line 1", "line 3", "line 4"]. The diff output shows that "line 2" was deleted from File A, and "line 5" was added in File B. Everything in the LCS is "unchanged" -- it serves as the anchor points around which insertions and deletions are identified.

Myers' Diff Algorithm

The most widely used diff algorithm in practice is Eugene W. Myers' algorithm, published in 1986 in the paper "An O(ND) Difference Algorithm and Its Variations." Git, GNU diff, and most modern diff tools use Myers' algorithm or a variation of it.

Myers' algorithm models the diff problem as a graph search. Imagine a 2D grid where the x-axis represents lines of File A and the y-axis represents lines of File B. Moving right means deleting a line from A; moving down means inserting a line from B. When lines match (A[x] == B[y]), you can move diagonally -- which is "free" because matching lines require no edit operations.

The algorithm finds the path from the top-left corner to the bottom-right corner that uses the fewest horizontal and vertical moves (i.e., the fewest insertions and deletions). This produces the shortest edit script -- the minimum set of changes needed to transform A into B.

The time complexity is O(ND), where N is the sum of the lengths of both files and D is the size of the minimum edit script (the number of differences). This means the algorithm is fast when the files are similar (small D) and slower when the files are very different (large D). For most practical use cases -- comparing two versions of a file that differ in a few places -- the algorithm is extremely efficient.

Patience Diff

Patience diff is a variation that often produces more human-readable output. It works by first finding "unique" matching lines -- lines that appear exactly once in each file. These unique lines serve as anchors that split the problem into smaller sub-problems. The algorithm then recursively diffs the text between the anchors using the standard LCS approach.

The advantage of patience diff is that it tends to align changes along structural boundaries. For example, when comparing code where a function was added between two existing functions, patience diff will correctly show the new function as an insertion, rather than producing a confusing interleaving of old and new code. Git supports patience diff via git diff --patience.

Histogram Diff

Histogram diff is an extension of patience diff developed for the JGit project (the Java implementation of Git). It uses a histogram of line occurrences to find low-occurrence matching lines as anchors. This approach handles cases where no truly unique lines exist (which would cause patience diff to fall back to Myers' algorithm). Histogram diff is available in Git via git diff --histogram and is the default diff algorithm in JGit.

Word-Level and Character-Level Diffing

Standard diff algorithms compare text line by line. But when you need finer granularity -- for example, to see exactly which word changed within a line -- word-level or character-level diffing is used. These approaches apply the same LCS algorithms but use words or characters as the comparison units instead of lines.

Character-level diffing is particularly useful for prose and documentation, where a single changed word in a long paragraph would otherwise show as the entire line being modified. Many modern diff tools and code review platforms combine line-level diffing (to identify changed lines) with word-level or character-level highlighting within those lines (to pinpoint exactly what changed).

3. Diff Output Formats

Diff tools can present their output in several formats, each suited to different use cases. Understanding these formats helps you read diffs efficiently and choose the right presentation for your needs.

Unified Diff Format

Unified diff is the most common format in modern development. It shows changes in a single stream with context lines for orientation. Each chunk (or "hunk") starts with a header showing the line numbers in both files:

--- a/config.yaml
+++ b/config.yaml
@@ -10,7 +10,8 @@
   database:
     host: localhost
     port: 5432
-    name: mydb_dev
+    name: mydb_prod
+    ssl: true
     pool_size: 10
   cache:
     enabled: true

Lines prefixed with - exist only in the original file (deletions). Lines prefixed with + exist only in the modified file (additions). Lines with no prefix are context -- they appear in both files and help you locate the change. The @@ header indicates that this chunk starts at line 10 in the original file (showing 7 lines) and line 10 in the modified file (showing 8 lines).

Unified diff is the default output of git diff and is the standard format for patch files. It is compact, readable, and unambiguous.

Side-by-Side Diff

Side-by-side diff displays the original and modified text in two parallel columns. Added lines appear only in the right column, deleted lines appear only in the left column, and modified lines appear in both columns with differences highlighted. This format is highly visual and makes it easy to see changes at a glance.

Side-by-side diff is the preferred format in graphical diff tools, code review platforms, and online diff checkers. While it requires more horizontal space than unified diff, it provides a more intuitive view of how the text has changed.

Inline Diff

Inline diff (also called "rendered diff" or "rich diff") shows the original text with additions highlighted in green and deletions highlighted in red, often with strikethrough styling on deleted text. This format is commonly used in document comparison tools, CMS revision history, and word-processor "track changes" features.

Context Diff Format

Context diff is an older format that shows changed lines with surrounding context. It uses *** to mark the original file section and --- for the modified file section, with ! to indicate changed lines. While largely superseded by unified diff, you may encounter context diffs in legacy systems and older patch files.

Normal Diff Format

The original Unix diff output format uses a terse notation like 3c3 (line 3 changed), 5a6,7 (lines 6-7 added after line 5), or 8,10d7 (lines 8-10 deleted). This format is the most compact but also the hardest for humans to read. It is rarely used directly today but remains the default output format of the traditional diff command without flags.

4. Diffing for Code Review

Code review is perhaps the most important application of diff tools. Every pull request, merge request, or code review session revolves around examining the diff -- the set of changes a developer is proposing to merge into the codebase. Reading diffs effectively is a core skill for software engineers.

Understanding Pull Request Diffs

When you open a pull request on GitHub, GitLab, or Bitbucket, the platform shows you a diff of all changes in the branch compared to the base branch. This diff is computed by running the equivalent of git diff base-branch...feature-branch. The output shows every file that was added, modified, or deleted, with changes highlighted at the line level and often at the word level within changed lines.

Effective code review requires more than just reading the diff top to bottom. Here are strategies for reviewing diffs efficiently:

Start with the summary: Look at the list of changed files first. Understand the scope of the change before diving into details. Are the changes concentrated in one area, or spread across the codebase?
Review structural changes first: Look at new files, deleted files, and renamed files before examining line-level changes. This gives you the high-level picture.
Focus on logic, not formatting: If a diff includes both logic changes and formatting changes, review them separately. Many teams use automated formatters to avoid mixing these in the same commit.
Check for completeness: Are there related changes missing? If a function signature changed, were all callers updated? If a configuration key was renamed, was it updated everywhere?
Look for what is NOT in the diff: Sometimes the most important review feedback is about code that should have been changed but was not -- missing error handling, missing tests, missing documentation updates.

Reviewing Large Diffs

Large diffs (hundreds or thousands of lines changed) are notoriously difficult to review. Research has shown that review quality drops significantly as diff size increases. When faced with a large diff:

Ask for smaller PRs: The best solution is prevention. Encourage your team to submit smaller, focused pull requests that are easier to review thoroughly.
Review commit by commit: If the branch has well-structured commits, review each commit individually. This tells a story of how the change was built up, which is easier to follow than the final aggregate diff.
Filter by file type: Review test files separately from source code. Review configuration changes separately from logic changes.
Use the file tree: Most code review platforms show a file tree. Use it to navigate to the most critical files first, rather than scrolling through the diff linearly.

Diff Annotations and Comments

Modern code review platforms allow you to leave comments on specific lines of a diff. This is a powerful feature for targeted feedback. Best practices for diff comments include:

Be specific -- reference the exact line or code pattern you are discussing
Distinguish between blocking issues (bugs, security problems) and suggestions (style, alternative approaches)
Provide concrete examples or code suggestions when possible
Ask questions when you do not understand the intent -- the diff shows what changed but not always why

5. Document Comparison

While diff tools originated in software development, document comparison is an equally important use case. Any workflow involving iterative editing of text documents benefits from the ability to see exactly what changed between versions.

Legal and Contract Documents

In legal work, precision matters. When a contract goes through multiple rounds of negotiation, each party needs to see exactly what the other side changed. A single word change -- from "may" to "shall," or "reasonable" to "best" -- can have significant legal implications. Diff tools designed for legal documents highlight every character-level change, ensuring nothing slips through unnoticed.

Unlike code diffs, legal document diffs typically need to handle rich text formatting, paragraph reflows, and non-structural whitespace changes. Specialized legal comparison tools (like document comparison features in Microsoft Word) account for these differences, but even a plain-text diff checker can be invaluable for comparing Markdown or plain-text drafts.

Technical Documentation

Technical writers frequently compare documentation versions to review edits, track contributions, and ensure consistency. When documentation is stored as plain text (Markdown, reStructuredText, AsciiDoc), standard diff tools work perfectly. The key challenge is distinguishing meaningful content changes from formatting or structural changes like line rewrapping.

For documentation review, word-level diffing is particularly valuable. A line-level diff might show an entire paragraph as changed when only one sentence was modified. Word-level highlighting within changed lines reveals the actual edit.

Content Management and Publishing

Content management systems often include built-in revision comparison. Editors can view the diff between any two revisions of a page or article, seeing additions in green and deletions in red. This is especially useful for:

Reviewing edits from multiple contributors
Reverting unwanted changes while keeping desired ones
Auditing the revision history of important content
Detecting vandalism or unauthorized edits (common on wikis)

Data and CSV Comparison

Comparing structured data like CSV files or database exports requires special consideration. A standard line-level diff can identify changed rows, but it does not understand column structure. Specialized CSV diff tools can show changes at the cell level, identify added or removed columns, and handle row reordering. For quick ad-hoc comparisons, however, a text diff checker often suffices -- sort the data first if row order is not significant.

6. Tracking Configuration Changes

Configuration files are critical infrastructure. A single incorrect value in a configuration file can cause an outage, a security vulnerability, or data loss. Diffing is an essential practice for managing configuration changes safely.

Infrastructure as Code

In modern DevOps workflows, infrastructure is defined in configuration files -- Terraform HCL, Kubernetes YAML, Ansible playbooks, CloudFormation templates, Dockerfiles. Every change to these files is reviewed as a diff before being applied. The terraform plan command, for example, produces a diff-like output showing what infrastructure changes will be made.

Configuration diffs require extra scrutiny because the blast radius can be enormous. A small change to a load balancer configuration might affect millions of users. A security group change might expose internal services to the internet. When reviewing infrastructure diffs:

Verify that the change matches the intended scope -- a targeted fix should not have unexpected side effects
Check for removed or changed security settings (encryption, access controls, network policies)
Look for hardcoded values that should be variables or references
Ensure sensitive values (passwords, API keys) are not being added to configuration files

Environment Comparison

Comparing configuration between environments (development, staging, production) is a common use case. When something works in staging but fails in production, diffing the configuration files between environments can quickly reveal the discrepancy -- a different database URL, a missing feature flag, a different timeout value.

# Compare production and staging Kubernetes configs
diff production/deployment.yaml staging/deployment.yaml

# Compare environment variable files
diff .env.production .env.staging

Audit and Compliance

Many compliance frameworks require tracking and reviewing all configuration changes. Diffing provides an auditable record of exactly what changed, when, and by whom. Storing configuration in version control (Git) provides a complete diff history that satisfies audit requirements.

Security teams use configuration diffing to detect unauthorized changes. By periodically comparing the current state of configuration files against a known-good baseline, drift can be detected and investigated. Tools like OSSEC, Tripwire, and AWS Config automate this process, but manual diffing remains a valuable skill for incident investigation.

Database Schema Diffs

Database migration tools generate diffs between the current schema and the desired schema. Tools like Alembic (Python/SQLAlchemy), Flyway, and Liquibase produce migration scripts that are essentially diffs -- they describe the sequence of ALTER TABLE, CREATE INDEX, and other DDL operations needed to transform the current schema into the target schema.

Reviewing schema migration diffs is critical because database changes are often irreversible (or expensive to reverse). Dropping a column, changing a data type, or modifying an index can have significant performance and data integrity implications.

7. Git Diff: A Deep Dive

Git is the most widely used version control system, and git diff is one of the most frequently used Git commands. Understanding its options and output is essential for any developer working with Git.

Basic Git Diff Commands

# Show unstaged changes (working directory vs staging area)
git diff

# Show staged changes (staging area vs last commit)
git diff --staged

# Compare two branches
git diff main..feature-branch

# Compare two specific commits
git diff abc123 def456

# Show changes in a specific file
git diff -- path/to/file.js

# Show changes introduced by a specific commit
git show abc123

Useful Git Diff Flags

Git diff supports numerous flags that control the output format and comparison behavior:

Flag	Description
`--stat`	Show a summary of files changed with insertion/deletion counts
`--name-only`	Show only the names of changed files
`--name-status`	Show names and status (Added, Modified, Deleted, Renamed)
`-w`	Ignore all whitespace differences
`-b`	Ignore changes in amount of whitespace
`--ignore-blank-lines`	Ignore changes that only add or remove blank lines
`--word-diff`	Show word-level differences inline
`--color-words`	Show word-level diffs with color highlighting
`-U<n>`	Show <n> lines of context (default is 3)
`--patience`	Use the patience diff algorithm
`--histogram`	Use the histogram diff algorithm
`--no-index`	Compare two files outside a Git repository

Understanding Diff Headers

A Git diff output starts with several header lines before the actual changes:

diff --git a/src/utils.js b/src/utils.js
index 3a4b5c6..7d8e9f0 100644
--- a/src/utils.js
+++ b/src/utils.js
@@ -42,7 +42,9 @@ function processData(input) {

The first line identifies the compared files. The index line shows the abbreviated object hashes and file mode. The --- and +++ lines label the original and modified versions. The @@ line (called the "hunk header") shows line numbers and, when available, the enclosing function name -- a feature called "funcname" that helps orient you in the code.

Diff with Rename Detection

Git can detect when a file was renamed (and optionally modified). Use git diff -M to enable rename detection, or git diff -M90% to set the similarity threshold (90% means files must be at least 90% similar to be considered a rename). When a rename is detected, the diff shows only the content changes, not the entire file as deleted-and-recreated.

# Detect renames with default 50% threshold
git diff -M

# Detect renames with 80% similarity threshold
git diff -M80%

# Also detect copies
git diff -C

Generating and Applying Patches

Git diffs can be saved as patch files and applied to other repositories or branches. This is useful for sharing changes without direct repository access:

# Generate a patch file from the last commit
git format-patch -1 HEAD

# Generate a diff and save it
git diff > my-changes.patch

# Apply a patch
git apply my-changes.patch

# Apply a formatted patch (includes commit message and author)
git am 0001-fix-bug.patch

8. Advanced Diffing Techniques

Semantic Diffing

Standard diff tools are line-oriented -- they do not understand the structure of the content they are comparing. Semantic diff tools parse the content (as code, JSON, XML, YAML, etc.) and compare the structural representation rather than the raw text. This produces more meaningful diffs that ignore irrelevant formatting changes.

For example, consider two JSON files that differ only in key order and whitespace:

// Version A
{"name": "Alice", "age": 30, "city": "NYC"}

// Version B
{
  "age": 30,
  "city": "NYC",
  "name": "Alice"
}

A standard text diff would show every line as changed. A semantic JSON diff would report no differences, because the two objects are equivalent. Semantic diffing is available for many formats through specialized tools and editor plugins.

Three-Way Merging

Three-way merging is an extension of diffing that compares three versions of a file: the common ancestor (base), and two modified versions (ours and theirs). The merge algorithm diffs both modified versions against the base, then combines the changes:

If only one version changed a particular section, the change is accepted automatically
If both versions changed the same section in the same way, the change is accepted (no conflict)
If both versions changed the same section differently, a merge conflict is reported and must be resolved manually

Three-way merging is the foundation of Git's merge and rebase operations. Tools like vimdiff, VS Code's merge editor, and kdiff3 provide visual three-way merge interfaces.

Directory Diffing

Sometimes you need to compare entire directory trees, not just individual files. Directory diff tools recursively compare two directories, showing files that are added, deleted, modified, or identical. This is useful for:

Comparing deployed code against the repository to detect drift
Comparing backup directories to verify integrity
Finding differences between two versions of a project
Identifying files that exist in one environment but not another

# Compare two directories recursively
diff -rq dir1/ dir2/

# Using git diff to compare directories
git diff --no-index dir1/ dir2/

Ignoring Patterns and Noise

Real-world diffs often contain noise -- changes that are technically different but not meaningful. Common sources of noise include:

Timestamps and dates: Auto-generated "last modified" fields
Build numbers and version strings: Incrementing version identifiers
Comments: Updated copyright years or documentation dates
Generated code: Auto-generated files that change whenever dependencies update
Whitespace and formatting: Indentation changes, trailing whitespace, line endings

Most diff tools support ignore patterns, regex-based exclusions, or custom comparison functions that let you filter out noise and focus on substantive changes.

Binary Diff

While standard diff tools operate on text, binary diff algorithms like bsdiff, xdelta, and VCDIFF can compute efficient deltas between binary files. These are used in software update systems (to distribute patches rather than full files), backup systems (for incremental backups), and version control (Git uses a custom binary delta format for pack files).

For human consumption, binary diffs are typically presented through specialized viewers: image diff tools that overlay before/after images or highlight pixel differences, hex editors that show byte-level changes, and structured format viewers that understand specific file formats (PDF, Office documents, etc.).

9. Best Practices

Follow these guidelines to get the most out of diff tools and produce clean, reviewable diffs in your own work.

Keep Changes Focused

The single most impactful practice for clean diffs is keeping changes focused. Each commit or pull request should address one concern: a bug fix, a feature addition, a refactoring, or a formatting cleanup -- but not all of these mixed together. Mixed-purpose diffs are hard to review, hard to revert, and hard to understand in the commit history.

Separate Formatting from Logic

If you need to reformatting code (changing indentation, wrapping lines, applying a linter) and also make logic changes, do them in separate commits. A formatting commit will touch many lines but make no functional change. A logic commit will touch few lines but require careful review. Combining them forces the reviewer to distinguish formatting noise from meaningful changes -- a tedious and error-prone process.

Use Meaningful Commit Messages

A diff shows what changed, but the commit message explains why. Good commit messages provide context that makes the diff easier to understand. When reviewing a diff months later, the commit message is often the only clue to the developer's intent.

Review Your Own Diff Before Submitting

Before pushing a commit or opening a pull request, review your own diff. Run git diff --staged to see exactly what you are about to commit. Look for:

Debug code or console.log statements that should be removed
Commented-out code that should be deleted
Unintended file changes (IDE settings, OS-generated files)
Hardcoded values that should be configurable
Missing or incomplete changes

Configure Diff Tools for Your Workflow

Set up your diff tools to match your workflow. Configure your Git diff tool, merge tool, and editor integrations once, and they will save you time on every future diff operation:

# Set VS Code as your diff tool
git config --global diff.tool vscode
git config --global difftool.vscode.cmd 'code --wait --diff $LOCAL $REMOTE'

# Set VS Code as your merge tool
git config --global merge.tool vscode
git config --global mergetool.vscode.cmd 'code --wait $MERGED'

# Use patience diff by default
git config --global diff.algorithm patience

# Enable rename detection by default
git config --global diff.renames true

Use .gitattributes for Language-Aware Diffs

Git can show function names in hunk headers if it knows the file's language. Configure .gitattributes to enable language-specific diff drivers:

# .gitattributes
*.py diff=python
*.rb diff=ruby
*.java diff=java
*.go diff=golang
*.rs diff=rust

With these settings, hunk headers will show the enclosing function or class name, making it much easier to orient yourself in a diff without additional context.

Leverage Inline Annotations

When comparing files with many changes, use word-level diffing to see exactly what changed within each line. The git diff --word-diff or git diff --color-words commands are invaluable for reviewing changes to prose, configuration values, or any content where line-level diffs are too coarse.

10. Using Our Free Diff Checker Tool

Our free Diff Checker tool lets you compare two pieces of text instantly in your browser. No data is sent to any server -- all comparison is performed locally on your machine using client-side JavaScript.

Side-by-Side View

Paste your original text on the left and the modified text on the right, then click "Compare." The tool highlights additions, deletions, and modifications with clear color coding. Matching lines are aligned so you can scan the diff visually.

Unified View

Switch to unified view for a compact single-column display. Added lines are highlighted in green, deleted lines in red, and context lines appear without highlighting. This view mirrors the format used by git diff and patch files.

Character-Level Highlighting

Within changed lines, the tool highlights the specific characters that differ. This makes it easy to spot the exact change when a line has been partially modified -- no more squinting to find the one changed character in a long line of code.

Key Features

100% client-side: Your data never leaves your browser
Syntax-aware: Code is displayed with syntax highlighting for readability
Ignore whitespace: Toggle whitespace comparison on or off
Case sensitivity: Choose whether to compare case-sensitively or case-insensitively
Line numbers: Both sides show line numbers for easy reference
Copy and share: Copy the diff output or share it with team members

Compare Text & Code Instantly

Stop eyeballing differences in text files. Use our free Diff Checker to compare any two texts side by side with character-level highlighting -- right in your browser, with zero data sent to any server.

Try the Diff Checker Now