ENOSUCHBLOG

Programming, philosophy, pedaling.

Towards an automated changelog workflow

Nov 5, 2020 Tags: programming, workflow

This post is at least a year old.

I’m a very big fan of changelogs: I love being able to get a quick (human-readable) summary of what’s changed between releases, and I actively consume them when integrating libraries and tools into both my personal and my professional projects.

Despite that, I’m not very good at maintaining my own changelogs: I waver between forgetting to update them entirely and dumping a formatted version of my git log¹ into them, defeating their entire purpose.

This post will hopefully collect some of my thoughts on changelog automation so that I can go about writing a tool (or composing preexisting tools) that solves changelog management for me. Maybe it’ll be useful to others too!

Brief background

This post is intended to be a collection of structured notes for myself, but it doesn’t hurt to provide some history and basic ground facts about changelogs for anybody else who decides to read it. So let’s do that.

As a development practice, changelogs are old. They’re probably as old as the practice of software versioning itself. They certainly predate version control systems, and serve a purpose that’s separate from message metadata in VCSes: they distill the essential changes that are relevant to humans, rather than a complete log of everything that has changed in between two points in time.
There is no one canonical changelog filename: common ones are ChangeLog, changelog, CHANGELOG, HISTORY, NEWS, and variations thereof. It’s also common to see changelogs formatted in Markdown and other markup languages. Users are expected to find the changelog for a project (usually with hints via other files, like the README).
There is no one canonical changelog format. GNU specifies one; many projects use ad-hoc formats or adhere to the general hierarchy of a Markdown document.
Despite the lack of a canonical format, most changelogs overlap in scope and structure: they list their changes by release version, break changes within a release up by category (feature, bugfix, regression, performance, &c), and generally credit individual users for changes on an entry-by-entry basis (by email, GitHub handle, &c).

Observations

First: changelog management is a burden that’s added to an engineer’s workflow. Like good commit messages, good changelog entries require active effort from maintainers and contributors. A good changelog management tool will need to reduce that burden to a manageable level.

Second: Because changelogs are burdensome, they need to require minimal buy-in from new contributors. Someone looking to contribute a one-line fix shouldn’t need to install a custom tool to ensure that the changelog reflects their work; that responsibility should lie with the maintainer(s).

Third: changelogs need to be human readable first, and weakly machine-consumable second. Why machine consumable at all? Because tools like Dependabot look for changelogs and make a best effort to include them their workflows. These workflows are fed to humans for consumption, so a good changelog tool will emit a format that allows existing automated tools to extract salient information for subsequent human consumption.

Fourth: changelogs are not immutable. Humans make mistakes (typos, forgetting changes, needing to yank a release, &c), and changelogs need to be updated to address those mistakes. A good changelog management tool will not be upset or cause additional work for users if a previous release entry changes beneath it².

Prior art

A few different individuals and groups have made efforts to standardize and develop tooling around changelogs:

As mentioned before, the GNU Project has a format that they adhere to. Emacs comes with functionality to interpret and modify their particular format. They also provide some reasonable guidance to authors, like not abbreviating identifiers ({dont,get,clever,like}-this) and using the VCS as a source of ground truth for metadata (dates, canonical author names and emails).
There’s keep a changelog, which (as of 1.0.0) provides a loose set of structure and content recommendations for effective changelog husbandry. keep a changelog focuses on Markdown-formatted changelogs, making it slightly less dated and more amenable to cross-referencing.
There exist many tools that do what shouldn’t be done, namely: turn a sequence of git log (and other metadata) entries into something vaguely resembling a changelog. I won’t link to these because they’re largely bad, with two exceptions:
- clog-cli (thanks, Sven) relies the Conventional Commits specification to turn commit messages into changelog entries. This ends up being a lot nicer than a lot of the git log --format=... solutions are, although attribution of changes and cross-referencing to issues and PRs are conspicuously missing. It also hasn’t been updated in a few years, which makes me sad.
- github-changelog-generator is similar, but uses GitHub tags, issues, and PRs as its sources of metadata rather than Git’s logs. This seems to produce reasonable results, including nice cross-referencing between originating issues and contributors (by GitHub handle). An obvious downside: it ties you to GitHub.
In addition to git log as a data source, Git also provides the less known git notes for associating arbitrary data with objects (e.g., individual commits). At least one person other than me has thought that they could be a good place to put information for changelog generation.

Goals and anti-goals

Given the observations and prior art above:

Goal: Eliminating individual contributor involvement. Contributors should need to do nothing more than make their changes and write normal commit messages.
Goal: Integration into the release workflow. Ideally, it should be impossible for me to cut a release without a corresponding version in the changelog. This will probably depend heavily on the particulars of a project’s release tooling; for cargo release it looks like pre-release-hook is the right solution.
Anti-goal: Configuration and flexibility. My ideal tool generates just CHANGELOG.md and doesn’t require any repository-specific configuration. If it needs project metadata, it pulls from the VCS or a subset of common project metadata files (Cargo.toml, pyproject.toml, &c). If it absolutely needs configuration for a project, then it should piggyback on one of those files.

A data model for changelogs

Here’s a reasonable data model for a changelog, based on (and more constrained than) keep a changelog’s informal specification:

A changelog is:

A Header section, and
A Sequence of Release Entries.

A Header section is:

A top-level header (i.e., <h1>) with the text “Changelog”, and
A Sequence of unstructured paragraphs (i.e., <p>) describing the scope of the changelog and its format.

A Release Entry is:

A second level header (i.e., <h2>) with the format {VERSION} - {YYYY-MM-DD} where {VERSION} is the release version (or “Unreleased” for an upcoming/in progress entry) and {YYYY-MM-DD} is the release date in ISO 8601 format, and
A Sequence of Change Sections.

A Change Section is:

A third level header (i.e., <h3>) with one valid Change Tag, and
An unordered list (i.e., <ul>) of Changes.

A Change Tag is one of “Added”, “Changed”, “Removed”, or “Fixed”.

A Change is:

A list item (i.e., <li>) with unstructured contents and the following (advisory) guidelines:
- The contents should be written in imperative tense
- Where appropriate, the contents should be postfixed with the following, in order
  - A parenthetical cross-reference (i.e., link) to the corresponding issue, PR, or other resource
  - An attribution, with a cross-reference to the contributor’s handle or email address in “friendly” format (e.g. Julia Agrippina <agrippina@example.com>).

Next steps

Now that I’ve laid everything out above, I’m less sure that I need a brand new tool to manage changelogs for my projects. Instead, I think I need to:

Switch to using Conventional Commits for all my projects³
Try out clog-cli and potentially make improvements to its codebase
Work on adapters for cargo release and other tools so that I can prevent myself from cutting releases without changelog entries

If that doesn’t work out, you’ll probably be seeing a changelog management tool from me in the near future.

Which I’d like to think isn’t bad, mind you. Just not appropriate, content- and detail-wise, for a changelog. ↩
Even more concretely: a good changelog tool shouldn’t attempt to re-parse an already emitted changelog beyond the bare minimum needed to situate the new change entry. ↩
I currently use my own style, which I like but certainly isn’t standard. ↩

Discussions: Reddit

Previously

Newer