We just released the biggest update to SemanticDiff yet! Version 0.9.0 of our programming language aware diff adds two new languages to our Visual Studio Code extension (HTML and Vue) and three new languages (HTML, Vue, Swift) to our GitHub App. This is combined with enhanced JSX/TSX support and several other improvements.
If you are using the GitHub App, all updates have already been rolled out and no action is required on your part. VS Code users should receive the update automatically when the editor checks for extension updates.
Swift support
Swift is currently only available in our GitHub App. Our VS Code extension will follow once we have a truly portable build of the parser for all 9 supported architectures.
Before we start with the web technology focused changes, we have an announcement for our macOS/iOS developers. Our SemanticDiff GitHub App can now generate language aware diffs for Swift files. It can, for example, filter out most style changes introduced by swiftlint --fix
and show only actual logic changes. This is especially useful if your files also contain other modifications that would be easily missed in a standard diff amidst all the format changes.
Here are a some screenshots to demonstrate what it looks like. Let’s start with a very simple toy example:
SemanticDiff can also distinguish between stylistic changes and logic changes in less obvious cases:
The full list of invariances (i.e. changes that are not displayed in the diff) includes:
- Adding/Removing whitespaces or line breaks outside of strings and comments
- Adding/Removing optional commas
- Adding/Removing unnecessary parenthesis
- Exchanging numeric literals with equivalent ones in different bases
- Exchanging escaped characters in a string/char with an equivalent representation (e.g. the literal)
If you work with GitHub pull requests and prefer diffs with less noise, you might want to give it a try. You can see what it looks like using a real world example here.
Supported for embedded languages
One of the biggest changes in this release is internal. SemanticDiff can now generate diffs for languages that embed other languages. An example of such a language is HTML, which can contain CSS stylesheets and JavaScript scripts. If the embedded language can’t be parsed, SemanticDiff will automatically fall back to a text diff for that part of the code, so you get at least a partially language aware diff.
HTML support
SemanticDiff 0.9.0 can now diff HTML files and we have taken care to filter out as many noisy changes as possible. To give you an example, SemanticDiff can determine that the following two HTML snippets are semantically identical:
<DIV id="test" class="bar foo"><SCRIPT SRC="test.js" defer></SCRIPT><a rel="nofollow noopener" href="#" autofocus="true"> Hello World 😀</a></DIV>
<div id="test" class="foo bar">
<script src="test.js" defer="true"></script>
<a rel="noopener nofollow" href="#" autofocus>Hello World 😀</a>
</div>
Can you spot all the ignored differences?
One of the main obstacles in implementing a language aware diff for HTML is the handling of whitespace. Most whitespace characters are not displayed by the browser and are either ignored or collapsed into a single character. SemanticDiff implements the whitespace processing rules from the “CSS Text Module Level 3” to detect these cases and remove noise from the diff. Since we don’t know which CSS stylesheets are used with the HTML file, SemanticDiff assumes the default display type for all tags and that they are visible.
With the implemented support for embedded languages, SemanticDiff is able to parse the contents of <style>
and <script>
tags. This gives you access to all the semantic comparison features already implemented for CSS and JavaScript. In future versions we plan to parse the contents of style=
attributes as well.
Here is a list of all invariances implemented for HTML:
- Collapse whitespace according to CSS rules (based on default tag display type)
- Ignore order of attributes in tags
- Ignore order of classes in class attributes
- Ignore order of values in rel attributes
- Ignore value of boolean attributes
- Treat tag and attribute names as case-insensitive
- Treat HTML entities and their textual representation as invariant
Vue SFC support
We didn’t stop with HTML and also added support for the Vue Single File Component format in this release. Unlike HTML, Vue’s <style>
tags can also contain SCSS and <script>
tags may contain TypeScript code. SemanticDiff will automatically select the correct parser based on the lang
attribute.
Unfortunately, we couldn’t reuse our HTML whitespace algorithm because directives in Vue like v-if
or v-for
can cause elements to be present once, multiple times or not at all. This makes it difficult to decide whether a whitespace is relevant for a diff. They may be collapsed in some cases but not in others. We decided to go the extra mile and implement an algorithm that evaluates all possible combinations and only ignores whitespace that is irrelevant in all of them.
For now the contents of interpolations {{ ... }}
are diffed as text. We plan to parse them in a future release as well.
Here is a list of all invariances implemented for Vue:
- Ignore order of classes in class attributes
- Ignore order of values in rel attributes
- Treat HTML entities and their textual representation as invariant
- Ignore order of attributes in tags unless a v-bind is encountered
- Treat tag and attribute names as case-insensitive
- Collapse whitespace according to CSS rules while evaluating all possible states of conditional directives (e.g. v-if)
JSX/TSX improvements
For those of you using React or other frameworks that utilizes JSX or TSX files, this release also brings some nice improvements. You can probably guess where this is going, but we have also improved the whitespace handling for JSX/TSX. Previously, all whitespace was treated as relevant, but now we apply the React whitespace handling rules to discard some whitespace completely and collapse the remaining consecutive whitespace characters. This should greatly reduce the noise ratio in JSX/TSX diffs.
We have also added two more invariances: The order of attributes within a tag/component is now ignored unless a spread operator is used. HTML entities and their textual representation are now also treated as equivalent. To help you navigate the code easier, SemanticDiff 0.9.0 displays more language constructs as scopes in the hunk header.
Minor improvements
SemanticDiff received a lot of minor improvements. Almost all parsers have been updated to support more language constructs or to fix minor bugs. We were also able to improve the performance of the diff generation. This should generally make it a few percent faster, but in some extreme cases it can reduce the computation time by up to 60% or more.
We hope you enjoyed this update. If you encounter any issues, please let us know in our issue tracker.