Michael Müller
17 Jul, 2024

How far should a programming language aware diff go?

Where should a semantic diff draw the line between relevant and irrelevant changes? What type of changes do developers care about?

The idea behind a programming language aware diff is to hide irrelevant changes and make diffs less noisy. But what exactly are “irrelevant changes”? In this blog post I want to discuss a question we often come across when working on SemanticDiff: What kind of changes do developers care about? To give an answer we first need to define different “levels” of changes.

Level 1: Irrelevant Whitespace

The most basic type of change we can ignore is the addition and deletion of irrelevant whitespace. This is very similar to git diff -w except that by parsing the code we can correctly determine whether a whitespace affects the interpretation of the program. This is important for languages like Python or when a whitespace is part of a string. In other words, we don’t care if the following code is written like this:

def foo(a: Int, b: Int) -> Int:
    return a + b

or that:

def foo(
    a: Int,
    b: Int
) -> Int:
    return a + b

I think most developers won’t care about such a change and don’t want it to be highlighted in their diff. They often use code formatters anyway and have given up control over where exactly the line breaks are placed. Line breaks are often simply added by the formatter because a newly added parameter or expression has caused the line to exceed the configured character limit.

Level 2: Irrelevant Tokens

The next level is to also ignore the addition/deletion of irrelevant tokens such as optional commas. We can reuse the code from the last level to create an example:

  def foo(
      a: Int,
-     b: Int
+     b: Int,
  ) -> Int:

Most developers probably feel the same way about this as they do about level 1. Discussing these changes in code reviews is often considered a waste of time. Instead we have created automatic tools like code formatters and linters to ensure a consistent style. So why bother with these changes and make the diff unnecessarily noisy?

Level 3: Semantic Equivalence

The previous levels focused on changes that only slightly modified the syntax of the code. Now, we go one step further and no longer care about the syntax at all and ignore all changes that do not modify the semantic interpretation of the code. Here are a few examples of what I mean:

- 255 * 0x1A4 + 5
+ (0xff * 0o644) + 0b101
- def foo(): int | None
+ def foo(): None | int
- const foo = function(a, b) { ... }
+ const foo = (a, b) => { ... }

I think this is where it gets interesting. You may agree with these examples, but what about a more extreme example? Let’s compare the following two snippets of code:

for (int i = 0; i < 10; ++i)
{
    ...
}
int i = 0;
while (i < 10)
{
    ...
    ++i;
}

Both examples share the same behavior but I would expect most developers to prefer the first one. Ignoring all kinds of changes - as long as the code is semantically equivalent - is probably going too far. Some variants of the same code may be easier to understand and less error-prone than others. On the other hand, having the option to ignore these kinds of changes can help you find modifications that were supposed to be no-ops but aren’t. For example, if you accidentally made a typo when changing the base of an integer literal.

Level 4: Mostly Identical

What about changes that usually don’t change the behavior of the code, except in rare cases? A good example is reordering imports in languages like Python or Go:

  import sys
+ import stat
  import os
- import stat

In most cases, such a change is perfectly fine, and there are even tools that automatically reorder imports. However, there is no guarantee that the initialization code of an imported package will not have a side effect on subsequent imports. Would you still rather hide such a change than manually check if the imports have just been reordered?

How far should a language aware diff go?

Now that we have looked at different “levels” of changes, where would you draw the line? Do you care about every optional semicolon or do you want a semantic diff to ignore as much as possible since you can still use a standard diff if necessary?

For SemanticDiff, we concluded that the cutoff should probably be somewhere inside level 3. We use predefined rules to filter out changes that are often performed by linters or when modernizing code. However, we don’t do a generic semantic comparison. This would be impractical to do in real time, and most developers would probably feel that they would lose too much control anyway.

However, from time to time someone asks us to ignore level 4 changes. Adding such a feature and making it the new default feels like playing with fire. Developers will only use a diff/code review tool if they can trust it. Nevertheless, we can still understand such use cases and will probably add customization options to enable such features in future releases. That way, you can decide for yourself if you are willing to take the risk of rare side effects.

Recent Articles

SemanticDiff 0.10.0: Support for Lua, XML & more
SemanticDiff 0.10.0: Support for Lua, XML & more

This release does not only add two languages to SemanticDiff, but also comes with many quality-of-life improvements such as a new minimap.

Read More
Improved User Interface For GitHub App
Improved User Interface For GitHub App

We added many quality of life improvements to our GitHub App. This includes an improved minimap, a thread list that lets you jump to a thread in the diff, a collapsible sidebar and more.

Read More
How far should a programming language aware diff go?
How far should a programming language aware diff go?

Where should a semantic diff draw the line between relevant and irrelevant changes? What type of changes do developers care about?

Read More