Michael Müller
09 Jul, 2024

SemanticDiff 0.9.0: Support for HTML, Vue, Swift & more

This release brings two new languages to our VS Code extension and three new ones to our GitHub App plus various improvements for front-end developers.

We just released the biggest update to SemanticDiff yet! Version 0.9.0 of our programming language aware diff adds two new languages to our Visual Studio Code extension (HTML and Vue) and three new languages (HTML, Vue, Swift) to our GitHub App. This is combined with enhanced JSX/TSX support and several other improvements.

If you are using the GitHub App, all updates have already been rolled out and no action is required on your part. VS Code users should receive the update automatically when the editor checks for extension updates.

Swift support

Swift is currently only available in our GitHub App. Our VS Code extension will follow once we have a truly portable build of the parser for all 9 supported architectures.

Before we start with the web technology focused changes, we have an announcement for our macOS/iOS developers. Our SemanticDiff GitHub App can now generate language aware diffs for Swift files. It can, for example, filter out most style changes introduced by swiftlint --fix and show only actual logic changes. This is especially useful if your files also contain other modifications that would be easily missed in a standard diff amidst all the format changes.

Here are a some screenshots to demonstrate what it looks like. Let’s start with a very simple toy example:

Swift toy example in SemanticDiff
SemanticDiff detects how line 1 (old) was spread across the lines 1-5 (new) and highlights only the new parameter.

SemanticDiff can also distinguish between stylistic changes and logic changes in less obvious cases:

Swift invariances in SemanticDiff
SemanticDiff can convert between different bases and understands character escaping.

The full list of invariances (i.e. changes that are not displayed in the diff) includes:

  • Adding/Removing whitespaces or line breaks outside of strings and comments
  • Adding/Removing optional commas
  • Adding/Removing unnecessary parenthesis
  • Exchanging numeric literals with equivalent ones in different bases
  • Exchanging escaped characters in a string/char with an equivalent representation (e.g. the literal)

If you work with GitHub pull requests and prefer diffs with less noise, you might want to give it a try. You can see what it looks like using a real world example here.

Supported for embedded languages

One of the biggest changes in this release is internal. SemanticDiff can now generate diffs for languages that embed other languages. An example of such a language is HTML, which can contain CSS stylesheets and JavaScript scripts. If the embedded language can’t be parsed, SemanticDiff will automatically fall back to a text diff for that part of the code, so you get at least a partially language aware diff.

HTML support

SemanticDiff 0.9.0 can now diff HTML files and we have taken care to filter out as many noisy changes as possible. To give you an example, SemanticDiff can determine that the following two HTML snippets are semantically identical:

<DIV id="test" class="bar foo"><SCRIPT SRC="test.js" defer></SCRIPT><a rel="nofollow noopener" href="#" autofocus="true"> Hello  World &#128512;</a></DIV>
<div id="test" class="foo bar">
    <script src="test.js" defer="true"></script>
    <a rel="noopener nofollow" href="#" autofocus>Hello World 😀</a>
</div>

Can you spot all the ignored differences?

One of the main obstacles in implementing a language aware diff for HTML is the handling of whitespace. Most whitespace characters are not displayed by the browser and are either ignored or collapsed into a single character. SemanticDiff implements the whitespace processing rules from the “CSS Text Module Level 3” to detect these cases and remove noise from the diff. Since we don’t know which CSS stylesheets are used with the HTML file, SemanticDiff assumes the default display type for all tags and that they are visible.

With the implemented support for embedded languages, SemanticDiff is able to parse the contents of <style> and <script> tags. This gives you access to all the semantic comparison features already implemented for CSS and JavaScript. In future versions we plan to parse the contents of style= attributes as well.

Here is a list of all invariances implemented for HTML:

  • Collapse whitespace according to CSS rules (based on default tag display type)
  • Ignore order of attributes in tags
  • Ignore order of classes in class attributes
  • Ignore order of values in rel attributes
  • Ignore value of boolean attributes
  • Treat tag and attribute names as case-insensitive
  • Treat HTML entities and their textual representation as invariant

Vue SFC support

We didn’t stop with HTML and also added support for the Vue Single File Component format in this release. Unlike HTML, Vue’s <style> tags can also contain SCSS and <script> tags may contain TypeScript code. SemanticDiff will automatically select the correct parser based on the lang attribute.

Vue example in SemanticDiff
Example how SemanticDiff filters our irrelevant changes in Vue.

Unfortunately, we couldn’t reuse our HTML whitespace algorithm because directives in Vue like v-if or v-for can cause elements to be present once, multiple times or not at all. This makes it difficult to decide whether a whitespace is relevant for a diff. They may be collapsed in some cases but not in others. We decided to go the extra mile and implement an algorithm that evaluates all possible combinations and only ignores whitespace that is irrelevant in all of them.

For now the contents of interpolations {{ ... }} are diffed as text. We plan to parse them in a future release as well.

Here is a list of all invariances implemented for Vue:

  • Ignore order of classes in class attributes
  • Ignore order of values in rel attributes
  • Treat HTML entities and their textual representation as invariant
  • Ignore order of attributes in tags unless a v-bind is encountered
  • Treat tag and attribute names as case-insensitive
  • Collapse whitespace according to CSS rules while evaluating all possible states of conditional directives (e.g. v-if)

JSX/TSX improvements

For those of you using React or other frameworks that utilizes JSX or TSX files, this release also brings some nice improvements. You can probably guess where this is going, but we have also improved the whitespace handling for JSX/TSX. Previously, all whitespace was treated as relevant, but now we apply the React whitespace handling rules to discard some whitespace completely and collapse the remaining consecutive whitespace characters. This should greatly reduce the noise ratio in JSX/TSX diffs.

We have also added two more invariances: The order of attributes within a tag/component is now ignored unless a spread operator is used. HTML entities and their textual representation are now also treated as equivalent. To help you navigate the code easier, SemanticDiff 0.9.0 displays more language constructs as scopes in the hunk header.

Minor improvements

SemanticDiff received a lot of minor improvements. Almost all parsers have been updated to support more language constructs or to fix minor bugs. We were also able to improve the performance of the diff generation. This should generally make it a few percent faster, but in some extreme cases it can reduce the computation time by up to 60% or more.

We hope you enjoyed this update. If you encounter any issues, please let us know in our issue tracker.

Recent Articles

SemanticDiff 0.9.0: Support for HTML, Vue, Swift & more
SemanticDiff 0.9.0: Support for HTML, Vue, Swift & more

This release brings two new languages to our VS Code extension and three new ones to our GitHub App plus various improvements for front-end developers.

Read More
SemanticDiff vs. Difftastic: How do they differ?
SemanticDiff vs. Difftastic: How do they differ?

Both tools aim to provide a better diff, but which one fits your needs? We compared their inner workings, features, and workflow integration to help you decide.

Read More
Unicode tricks in pull requests: Do review tools warn us?
Unicode tricks in pull requests: Do review tools warn us?

How well do GitHub, GitLab and Bitbucket support reviewers in finding malicious code changes in pull requests? Let’s give it a test.

Read More