SemanticDiff 0.10.0: Support for Lua, XML & more

We are happy to announce the release of SemanticDiff 0.10.0! This update not only extends our language aware diff with two new languages/file formats, but also comes with many quality-of-life improvements.

If you are using the GitHub App, all updates have already been rolled out and no action is required on your part. You may also want to check out our Improved User Interface For GitHub App blog post which goes into more details on the UI/UX improvements. VS Code users should receive the update automatically when the editor checks for extension updates.

Lua support

This release brings good news for those who integrate a Lua interpreter into their game or application. SemanticDiff 0.10.0 can now generate language-aware diffs for Lua scripts. This helps you to identify the actual logic changes, even if the script has been reformatted or code has been moved around.

Here is an example to demonstrate what this looks like in practice:

Lua example in SemanticDiff — SemanticDiff doesn’t care how a string literal is written or if unnecessary line breaks are added.

XML/DTD support

XML files can look wildly different and still contain identical data, which can be challenging for traditional diff utilities. This release adds support for XML/DTD and helps you see more clearly how the actual data has changed, while ignoring all those formatting changes. No matter if the attributes within a tag get reordered or a CDATA block gets converted to a text node.

SemanticDiff also ignores whitespace at the beginning and end of a text, i.e. <foo> Hello </foo> is considered identical to <foo>Hello</foo>. This is not covered by the XML specification but is true for most real-world uses of XML. This behavior can be disabled for a node and all its children by adding the xml:space="preserve" attribute.

Here you can see the XML support in action:

XML example in SemanticDiff — This example showcases various types of changes SemanticDiff can filter out.

Support for XML entities is disabled with the exception of character entities within strings. Entities are rarely used, have been the source of multiple attack vectors (e.g. Billion laughs attack, XML external entity attack) and changes to them would be hard to visualize in a language aware diff anyway.

Generic diff improvements

The main reason why SemanticDiff can produce better diffs for refactored code than traditional tools is that it primarily compares the structure of the old and new code instead of the text. Yet SemanticDiff can’t get rid of text comparisons completely as it still needs to find out how similar identifiers or strings are. This release improves the text comparison step by better matching modified texts that are either very long or have been split.

Besides these user-visible changes, there are also some improvements under the hood. For example, the old and new code are now parsed in parallel to reduce parsing time.

New Minimap mode

So far SemanticDiff provided a minimap similar to the one used by VS Code’s standard diff viewer. It displayed which areas of the file have been modified using markers but didn’t give an overview of the actual code. With version 0.10.0 we have now introduced a new minimap mode that displays a true thumbnail view of the side-by-side diff, making it easier to navigate.

The new mode “Detailed” is enabled by default, but you can go to the SemanticDiff extension settings (VS Code) or user settings (GitHub App) and change it back to “Overview” if you prefer the old mode. You can now also disable the minimap altogether if you don’t need it.

Text layouting and text selection

In order to achieve good performance for large files and to implement complex annotations such as moved code indicators, SemanticDiff needs to manually layout the text. This means calculating the position of each line or even character and telling the browser where to display it. While this worked well for the most part, there were still a few issues that have been addressed in this release.

SemanticDiff can now handle characters with different widths better. While this shouldn’t happen with monospace fonts, certain characters like Emojis, might be substituted by other fonts and therefore have a different size. We have also improved the way text selections work. When you select code, all non code elements such as line numbers are ignored and your selection is now limited to the side the selection started on. Selections are now also preserved if they move out of the visible screen area.

Minor improvements

This release also brings many minor improvements. Some of the more notable ones are listed below:

C#: Switch to Roslyn parser to improve overall quality and bring C# support up to version 13
Add option to allow using SemanticDiff as default diff viewer even for unsupported languages
Fix a bug where jumping to next / previous change didn’t work
Fix automatic closing of tabs by adding a workaround for VS Code bug #228270

We hope you enjoyed this update. If you encounter any issues, please let us know in our issue tracker.