Michael Müller
04 Feb, 2025

What should semantic diffs highlight: The change or its effect?

Language aware diffs can detect changes in the semantics of the code that are invisible to traditional diffs. How should they be highlighted?

One big part of SemanticDiff is to make diffs less noisy by hiding irrelevant changes. Typical examples of such changes are line breaks added between function arguments or when a single-quoted string is converted to a double-quoted string. Just the kinds of changes that a code formatter may introduce to make your code more readable.

In most cases this is straightforward, just don’t highlight the added line break. But what about the cases where the answer isn’t so obvious? Let’s look at two Python examples to explain what I mean.

- x = "foo"
+ x = 'foo'

This is the obvious one: The way the string is written changes, but its content remains the same. There is nothing to highlight for SemanticDiff. However, if we change this example just a little bit, things get more interesting:

- x = f"foo {bar}"
+ x = "foo {bar}"

If we just look at the text, the only difference between the two lines is that we have removed the “f” in front of the string literal. By doing this, we have turned the formatted string literal into a normal string literal which has consequences for the content of the string. The {bar} part is no longer a placeholder for a value and just becomes the text “{bar}”, even though this part of the code hasn’t changed.

To highlight or not to highlight?

The second example raises a question: How should we highlight a case where a change in one part of the code affects the semantics of another part? There are basically three options we can choose from:

Option 1: Highlight only the text that changes

We could treat this like a traditional diff and highlight only the text that changes:

- x = f"foo {bar}"
+ x = "foo {bar}"
Pro Con
The aim is to make diffs less noisy and highlighting unchanged bits of code adds visual noise. The consequences of a change might get overlooked.

Option 2: Highlight only the consequences

We only show how the semantic interpretation has changed and ignore the changed parsing instruction:

- x = f"foo {bar}"
+ x = "foo {bar}"
Pro Con
The parsing instruction (f prefix) is not part of the code logic and should be ignored. Looks like a bug, we highlight unchanged code.

Option 3: Highlight both

We can highlight both changes, the visible and “invisible” one:

- x = f"foo {bar}"
+ x = "foo {bar}"
Pro Con
The user has a better chance of finding out why an unchanged piece of code has been highlighted. Adds the most “visual noise”.

Each option has its own advantages and disadvantages and there is probably none that everyone would agree on. However, we believe that the purpose of a semantic diff is not only to hide changes, but also to inform developers about easily overlooked changes. We therefore use option 3, but can override this on a case-by-case basis if another option is clearly better.

A better solution?

This example showcased a very simple case, but things can get much more complex. Just consider languages that support macros. If you modify their definition, the change can affect many different places in the code. We therefore think that neither of the above options is good in the long run.

To avoid confusing developers, we need to extend the way diffs are displayed. By adding a third change type (besides added and removed code) that uses a different color (not red/green), we can clearly mark areas of code that have changed semantics even though the text remains the same. Ideally, the diff could also indicate which change caused the semantic difference.

Do you agree or would you choose a different solution?

Recent Articles

What should semantic diffs highlight: The change or its effect?
What should semantic diffs highlight: The change or its effect?

Language aware diffs can detect changes in the semantics of the code that are invisible to traditional diffs. How should they be highlighted?

Read More
SemanticDiff 0.10.0: Support for Lua, XML & more
SemanticDiff 0.10.0: Support for Lua, XML & more

This release does not only add two languages to SemanticDiff, but also comes with many quality-of-life improvements such as a new minimap.

Read More
Improved User Interface For GitHub App
Improved User Interface For GitHub App

We added many quality of life improvements to our GitHub App. This includes an improved minimap, a thread list that lets you jump to a thread in the diff, a collapsible sidebar and more.

Read More