How far should a programming language aware diff go?

The idea behind a programming language aware diff is to hide irrelevant changes and make diffs less noisy. But what exactly are “irrelevant changes”? In this blog post I want to discuss a question we often come across when working on SemanticDiff: What kind of changes do developers care about? To give an answer we first need to define different “levels” of changes.

Level 1: Irrelevant Whitespace

The most basic type of change we can ignore is the addition and deletion of irrelevant whitespace. This is very similar to git diff -w except that by parsing the code we can correctly determine whether a whitespace affects the interpretation of the program. This is important for languages like Python or when a whitespace is part of a string. In other words, we don’t care if the following code is written like this:

def foo(a: Int, b: Int) -> Int:
    return a + b

or that:

def foo(
    a: Int,
    b: Int
) -> Int:
    return a + b

I think most developers won’t care about such a change and don’t want it to be highlighted in their diff. They often use code formatters anyway and have given up control over where exactly the line breaks are placed. Line breaks are often simply added by the formatter because a newly added parameter or expression has caused the line to exceed the configured character limit.

Level 2: Irrelevant Tokens

The next level is to also ignore the addition/deletion of irrelevant tokens such as optional commas. We can reuse the code from the last level to create an example:

  def foo(
      a: Int,
-     b: Int
+     b: Int,
  ) -> Int:

Most developers probably feel the same way about this as they do about level 1. Discussing these changes in code reviews is often considered a waste of time. Instead we have created automatic tools like code formatters and linters to ensure a consistent style. So why bother with these changes and make the diff unnecessarily noisy?

Level 3: Semantic Equivalence

The previous levels focused on changes that only slightly modified the syntax of the code. Now, we go one step further and no longer care about the syntax at all and ignore all changes that do not modify the semantic interpretation of the code. Here are a few examples of what I mean:

- 255 * 0x1A4 + 5
+ (0xff * 0o644) + 0b101

- def foo(): int | None
+ def foo(): None | int

- const foo = function(a, b) { ... }
+ const foo = (a, b) => { ... }

I think this is where it gets interesting. You may agree with these examples, but what about a more extreme example? Let’s compare the following two snippets of code:

for (int i = 0; i < 10; ++i)
{
    ...
}

int i = 0;
while (i < 10)
{
    ...
    ++i;
}

Both examples share the same behavior but I would expect most developers to prefer the first one. Ignoring all kinds of changes - as long as the code is semantically equivalent - is probably going too far. Some variants of the same code may be easier to understand and less error-prone than others. On the other hand, having the option to ignore these kinds of changes can help you find modifications that were supposed to be no-ops but aren’t. For example, if you accidentally made a typo when changing the base of an integer literal.

Level 4: Mostly Identical

What about changes that usually don’t change the behavior of the code, except in rare cases? A good example is reordering imports in languages like Python or Go:

  import sys
+ import stat
  import os
- import stat

In most cases, such a change is perfectly fine, and there are even tools that automatically reorder imports. However, there is no guarantee that the initialization code of an imported package will not have a side effect on subsequent imports. Would you still rather hide such a change than manually check if the imports have just been reordered?

How far should a language aware diff go?

Now that we have looked at different “levels” of changes, where would you draw the line? Do you care about every optional semicolon or do you want a semantic diff to ignore as much as possible since you can still use a standard diff if necessary?

For SemanticDiff, we concluded that the cutoff should probably be somewhere inside level 3. We use predefined rules to filter out changes that are often performed by linters or when modernizing code. However, we don’t do a generic semantic comparison. This would be impractical to do in real time, and most developers would probably feel that they would lose too much control anyway.

However, from time to time someone asks us to ignore level 4 changes. Adding such a feature and making it the new default feels like playing with fire. Developers will only use a diff/code review tool if they can trust it. Nevertheless, we can still understand such use cases and will probably add customization options to enable such features in future releases. That way, you can decide for yourself if you are willing to take the risk of rare side effects.

Subscribe for Developer Updates

Get updates on our SemanticDiff VS Code Extension and GitHub App, and be the first to know about new devlogs, tutorials, and blog posts.

Recent Articles

SemanticDiff

How far should a programming language aware diff go?

Level 1: Irrelevant Whitespace

Level 2: Irrelevant Tokens

Level 3: Semantic Equivalence

Level 4: Mostly Identical

How far should a language aware diff go?

Recent Articles

What should semantic diffs highlight: The change or its effect?

SemanticDiff 0.10.0: Support for Lua, XML & more

Improved User Interface For GitHub App

How far should a programming language aware diff go?

Level 1: Irrelevant Whitespace

Level 2: Irrelevant Tokens

Level 3: Semantic Equivalence

Level 4: Mostly Identical

How far should a language aware diff go?

Subscribe for Developer Updates

Something went wrong

Thank You For Your Interest

Recent Articles

What should semantic diffs highlight: The change or its effect?

SemanticDiff 0.10.0: Support for Lua, XML & more

Improved User Interface For GitHub App