What should semantic diffs highlight: The change or its effect?

One big part of SemanticDiff is to make diffs less noisy by hiding irrelevant changes. Typical examples of such changes are line breaks added between function arguments or when a single-quoted string is converted to a double-quoted string. Just the kinds of changes that a code formatter may introduce to make your code more readable.

In most cases this is straightforward, just don’t highlight the added line break. But what about the cases where the answer isn’t so obvious? Let’s look at two Python examples to explain what I mean.

- x = "foo"
+ x = 'foo'

This is the obvious one: The way the string is written changes, but its content remains the same. There is nothing to highlight for SemanticDiff. However, if we change this example just a little bit, things get more interesting:

- x = f"foo {bar}"
+ x = "foo {bar}"

If we just look at the text, the only difference between the two lines is that we have removed the “f” in front of the string literal. By doing this, we have turned the formatted string literal into a normal string literal which has consequences for the content of the string. The {bar} part is no longer a placeholder for a value and just becomes the text “{bar}”, even though this part of the code hasn’t changed.

To highlight or not to highlight?

The second example raises a question: How should we highlight a case where a change in one part of the code affects the semantics of another part? There are basically three options we can choose from:

Option 1: Highlight only the text that changes

We could treat this like a traditional diff and highlight only the text that changes:

- x = f"foo {bar}"
+ x = "foo {bar}"

Pro	Con
The aim is to make diffs less noisy and highlighting unchanged bits of code adds visual noise.	The consequences of a change might get overlooked.

Option 2: Highlight only the consequences

We only show how the semantic interpretation has changed and ignore the changed parsing instruction:

- x = f"foo {bar}"
+ x = "foo {bar}"

Pro	Con
The parsing instruction (`f` prefix) is not part of the code logic and should be ignored.	Looks like a bug, we highlight unchanged code.

Option 3: Highlight both

We can highlight both changes, the visible and “invisible” one:

- x = f"foo {bar}"
+ x = "foo {bar}"

Pro	Con
The user has a better chance of finding out why an unchanged piece of code has been highlighted.	Adds the most “visual noise”.

Each option has its own advantages and disadvantages and there is probably none that everyone would agree on. However, we believe that the purpose of a semantic diff is not only to hide changes, but also to inform developers about easily overlooked changes. We therefore use option 3, but can override this on a case-by-case basis if another option is clearly better.

A better solution?

This example showcased a very simple case, but things can get much more complex. Just consider languages that support macros. If you modify their definition, the change can affect many different places in the code. We therefore think that neither of the above options is good in the long run.

To avoid confusing developers, we need to extend the way diffs are displayed. By adding a third change type (besides added and removed code) that uses a different color (not red/green), we can clearly mark areas of code that have changed semantics even though the text remains the same. Ideally, the diff could also indicate which change caused the semantic difference.

Do you agree or would you choose a different solution?

Subscribe for Developer Updates

Get updates on our SemanticDiff VS Code Extension and GitHub App, and be the first to know about new devlogs, tutorials, and blog posts.

Recent Articles

SemanticDiff

What should semantic diffs highlight: The change or its effect?

To highlight or not to highlight?

Option 1: Highlight only the text that changes

Option 2: Highlight only the consequences

Option 3: Highlight both

A better solution?

Recent Articles

What should semantic diffs highlight: The change or its effect?

SemanticDiff 0.10.0: Support for Lua, XML & more

Improved User Interface For GitHub App

What should semantic diffs highlight: The change or its effect?

To highlight or not to highlight?

Option 1: Highlight only the text that changes

Option 2: Highlight only the consequences

Option 3: Highlight both

A better solution?

Subscribe for Developer Updates

Something went wrong

Thank You For Your Interest

Recent Articles

What should semantic diffs highlight: The change or its effect?

SemanticDiff 0.10.0: Support for Lua, XML & more

Improved User Interface For GitHub App