Michael Müller
26 Jul, 2023

SemanticDiff 0.8.4: Quality improvements & .po support

The fourth beta release of SemanticDiff brings several quality improvements, new invariances and support for .po files to our VS Code extension and GitHub App.

We have just released version 0.8.4 of our Visual Studio Code extension SemanticDiff. This update doesn’t add many new features, but focuses on quality improvements. The way the old and new code is matched has been improved, some minor bugs have been fixed and new invariances have been added. Nevertheless, there is still one new feature: support for .po files.

Since the VS Code extension and the GitHub pull request viewer are largely based on the same source code, all changes also apply to the GitHub App. If you are using the GitHub App, all updates have already been rolled out and no action is required on your part. VS Code users will usually receive the update automatically when the editor checks for extension updates. If not, you can get the latest packages from the Visual Studio Marketplace:

Download Beta

Improved Matching

A major challenge in implementing a new diff is defining what an optimal diff should look like. A simple approach that tries to minimize the number of shown insertions and deletions will not cut it. Consider the following two lines as an example:

a, b = foobar(c)
z = x, y

Logically, there is no connection between them. They work on different variables, the first one calls a function and so on. Yet they both contain an assignment which is enough for an AST based diff to match them (if there are no better alternatives). These matches aren’t really helpful though, because everything except the equals operator would show up as additions and deletions. If you also support moves, you will get all sorts of spurious matches since a programming language only supports so many code constructs.

We found this to be one of the biggest remaining sources of unnecessarily complex diffs and decided to fix it in this release. We added additional constraints on how much of the old and new code must overlap to be matched. We also improved the way moves are detected. Previously, it could happen that a bunch of consecutive lines were moved but the algorithm detected them as two individual moves with separate visual indicators. Now they are combined into a single move. We’re not done with the move part yet, and more improvements will follow in the next release. In addition to these user-visible changes, we also improved the performance of one step of the matching algorithm.

Support for gettext .po files

While our focus is on support for programming languages, there are other related files that also benefit from semantic diffs. We have received a feature request to add support for the .po file format of gettext. Gettext is used to implement multi-language support in applications. It stores the strings that need to be translated in .po files along with a reference to their location in the source code. This can cause a lot of noise in diffs when code is added or removed. All locations of subsequent translatable strings in the same file are affected and show up as changes. These changes don’t affect the translation and only serve as context for the translator.

Fortunately, this was quite easy to implement and you can now compare .po files in SemanticDiff 0.8.4. To hide changes that only update code references (or any other line starting with a #) simply instruct SemanticDiff to ignore comments. To see what a difference this can make, compare this commit with comment changes enabled and disabled. The order of the messages is always ignored.

Additional invariances

The main feature of a semantic diff is to hide changes that have no effect. To get a little closer to our goal, we have added three new invariances in this release:

  • JavaScript/TypeScript: Replacing an anonymous function with an equivalent arrow function (or vice versa) is no longer shown as change. The following codes are treated as equivalent:

    const foo = function(a, b) {
    	console.log(a * b);
    }
    
    const foo = (a, b) => {
    	console.log(a * b);
    }
    
  • All languages: Literals of numbers are now compared case insensitive and underscores are ignored during the comparison (if supported by the language). Comparing the following two Go codes will therefore result in an empty diff:

    const a = 0XDEADBEEF;
    const b = 1000_0000;
    const c = 1E2;
    
    const a = 0xdeadbeef;
    const b = 10000000;
    const c = 1e2;
    
  • Python: The order of keyword arguments in a function call are ignored. Such a change would not show up in our diff:

    x = foobar(a=1, b=2)
    
    x = foobar(b=2, a=1)
    

Minor changes

This release also introduces two minor changes to the user interface:

Toggle button for comments

Previously, you could choose to ignore changes in comments by selecting the corresponding item from the More Actions… menu in the top right corner. We wanted to make this feature a little easier to access, so we created a toggle button instead. You can now switch between the two modes by clicking the following button:

VS Code

Changes in comments are shown

Changes in comments are shown

Changes in comments are hidden

Changes in comments are hidden

GitHub App

Changes in comments are shown

Changes in comments are shown

Changes in comments are hidden

Changes in comments are hidden

Fallback diff

While implementing our pull request viewer we needed a way to display diffs for file formats or languages that aren’t supported yet. To solve this, we added a basic fallback diff that treats each line as a node in an AST tree. The result is a very basic diff with line-based move detection. Since it might also be of some value to our VS Code users, we decided to enable it for our extension as well. If you try to generate a diff for an unsupported language in SemanticDiff 0.8.4 you will no longer get an error message, but instead a diff will open with the following warning:

Fallback diff warning

We hope you enjoyed this update. If you still find cases where SemanticDiff creates sub-optimal diffs, please let us know in our issue tracker.

Recent Articles

SemanticDiff vs. Difftastic: How do they differ?
SemanticDiff vs. Difftastic: How do they differ?

Both tools aim to provide a better diff, but which one fits your needs? We compared their inner workings, features, and workflow integration to help you decide.

Read More
Unicode tricks in pull requests: Do review tools warn us?
Unicode tricks in pull requests: Do review tools warn us?

How well do GitHub, GitLab and Bitbucket support reviewers in finding malicious code changes in pull requests? Let’s give it a test.

Read More
SemanticDiff 0.8.8: Support For Rust And More
SemanticDiff 0.8.8: Support For Rust And More

The eight beta release of our VS Code extension / GitHub App SemanticDiff adds support for Rust, new invariances and other enhancements.

Read More