Nov 5, 2020 Tags: programming, workflow
I’m a very big fan of changelogs: I love being able to get a quick (human-readable) summary of what’s changed between releases, and I actively consume them when integrating libraries and tools into both my personal and my professional projects.
Despite that, I’m not very good at maintaining my own changelogs: I waver between forgetting
to update them entirely and dumping a formatted version of my git log
1 into them, defeating
their entire purpose.
This post will hopefully collect some of my thoughts on changelog automation so that I can go about writing a tool (or composing preexisting tools) that solves changelog management for me. Maybe it’ll be useful to others too!
This post is intended to be a collection of structured notes for myself, but it doesn’t hurt to provide some history and basic ground facts about changelogs for anybody else who decides to read it. So let’s do that.
As a development practice, changelogs are old. They’re probably as old as the practice of software versioning itself. They certainly predate version control systems, and serve a purpose that’s separate from message metadata in VCSes: they distill the essential changes that are relevant to humans, rather than a complete log of everything that has changed in between two points in time.
There is no one canonical changelog filename: common ones are ChangeLog
, changelog
,
CHANGELOG
, HISTORY
, NEWS
, and variations thereof. It’s also common to see changelogs formatted
in Markdown and other markup languages. Users are expected to find the changelog for a project
(usually with hints via other files, like the README
).
There is no one canonical changelog format. GNU specifies one; many projects use ad-hoc formats or adhere to the general hierarchy of a Markdown document.
Despite the lack of a canonical format, most changelogs overlap in scope and structure: they list their changes by release version, break changes within a release up by category (feature, bugfix, regression, performance, &c), and generally credit individual users for changes on an entry-by-entry basis (by email, GitHub handle, &c).
First: changelog management is a burden that’s added to an engineer’s workflow. Like good commit messages, good changelog entries require active effort from maintainers and contributors. A good changelog management tool will need to reduce that burden to a manageable level.
Second: Because changelogs are burdensome, they need to require minimal buy-in from new contributors. Someone looking to contribute a one-line fix shouldn’t need to install a custom tool to ensure that the changelog reflects their work; that responsibility should lie with the maintainer(s).
Third: changelogs need to be human readable first, and weakly machine-consumable second. Why machine consumable at all? Because tools like Dependabot look for changelogs and make a best effort to include them their workflows. These workflows are fed to humans for consumption, so a good changelog tool will emit a format that allows existing automated tools to extract salient information for subsequent human consumption.
Fourth: changelogs are not immutable. Humans make mistakes (typos, forgetting changes, needing to yank a release, &c), and changelogs need to be updated to address those mistakes. A good changelog management tool will not be upset or cause additional work for users if a previous release entry changes beneath it2.
A few different individuals and groups have made efforts to standardize and develop tooling around changelogs:
As mentioned before, the GNU Project has a format that they adhere to. Emacs comes with
functionality to interpret and modify their particular format. They also provide some reasonable
guidance to authors, like not abbreviating identifiers ({dont,get,clever,like}-this
) and using
the VCS as a source of ground truth for metadata (dates, canonical author names and emails).
There’s keep a changelog, which (as of 1.0.0) provides a loose set of structure and content recommendations for effective changelog husbandry. keep a changelog focuses on Markdown-formatted changelogs, making it slightly less dated and more amenable to cross-referencing.
There exist many tools that do what shouldn’t be done, namely: turn a sequence of git log
(and other metadata) entries into something vaguely resembling a changelog. I won’t link to
these because they’re largely bad, with two exceptions:
clog-cli
(thanks, Sven) relies
the Conventional Commits specification
to turn commit messages into changelog entries. This ends up being
a lot nicer than a lot of the
git log --format=...
solutions are, although attribution of changes and cross-referencing
to issues and PRs are conspicuously missing. It also hasn’t been updated in a few years, which
makes me sad.
github-changelog-generator is similar, but uses GitHub tags, issues, and PRs as its sources of metadata rather than Git’s logs. This seems to produce reasonable results, including nice cross-referencing between originating issues and contributors (by GitHub handle). An obvious downside: it ties you to GitHub.
In addition to git log
as a data source, Git also provides the less known git notes
for
associating arbitrary data with objects (e.g., individual commits).
At least one person other than me
has thought that they could be a good place to put information for changelog generation.
Given the observations and prior art above:
Goal: Eliminating individual contributor involvement. Contributors should need to do nothing more than make their changes and write normal commit messages.
Goal: Integration into the release workflow. Ideally, it should be impossible for me to cut a
release without a corresponding version in the changelog. This will probably depend heavily on
the particulars of a project’s release tooling; for
cargo release
it looks like pre-release-hook
is
the right solution.
Anti-goal: Configuration and flexibility. My ideal tool generates just CHANGELOG.md
and
doesn’t require any repository-specific configuration. If it needs project metadata, it pulls
from the VCS or a subset of common project metadata files (Cargo.toml
, pyproject.toml
, &c).
If it absolutely needs configuration for a project, then it should piggyback on one of those
files.
Here’s a reasonable data model for a changelog, based on (and more constrained than) keep a changelog’s informal specification:
A changelog is:
A Header section is:
<h1>
) with the text “Changelog”, and<p>
) describing the scope of the changelog
and its format.A Release Entry is:
<h2>
) with the format {VERSION} - {YYYY-MM-DD}
where
{VERSION}
is the release version (or “Unreleased” for an upcoming/in progress entry) and
{YYYY-MM-DD}
is the release date in
ISO 8601 format, andA Change Section is:
<h3>
) with one valid Change Tag, and<ul>
) of Changes.A Change Tag is one of “Added”, “Changed”, “Removed”, or “Fixed”.
A Change is:
<li>
) with unstructured contents and the following (advisory) guidelines:
Julia Agrippina <agrippina@example.com>
).Now that I’ve laid everything out above, I’m less sure that I need a brand new tool to manage changelogs for my projects. Instead, I think I need to:
clog-cli
and potentially make improvements to its codebasecargo release
and other tools so that I can prevent myself from cutting
releases without changelog entriesIf that doesn’t work out, you’ll probably be seeing a changelog management tool from me in the near future.
Which I’d like to think isn’t bad, mind you. Just not appropriate, content- and detail-wise, for a changelog. ↩
Even more concretely: a good changelog tool shouldn’t attempt to re-parse an already emitted changelog beyond the bare minimum needed to situate the new change entry. ↩
I currently use my own style, which I like but certainly isn’t standard. ↩