ENOSUCHBLOG

Programming, philosophy, pedaling.

Tools that I want

Apr 13, 2021 Tags: devblog, programming

This post is at least a year old.

There are many tools that I want, and only so many hours in the day for me to write (and maintain) them. Normally I allow them to languish in an ideas.txt file, but today I thought I’d share (some of) them in two hopes:

That they might already exist, and people can direct me towards them;
That others might be inspired and take a stab at building them

I’ve listed each tool in its own section below, along with some idle thoughts and my ideal interface (normally some kind of CLI) for interacting with it, where applicable. After all, an unusable tool is as good as no tool at all¹.

A fuzzer corpus helper

Fuzzing is the process of exploring a program’s input space with random (but guided) inputs in the hopes of provoking unintentional program behavior (crashes, hangs, &c).

Effective fuzzing relies heavily on a well selected corpus: corpus samples must be diverse (to encourage meaningful mutations) but not too diverse (a diffuse corpus means lots of time wasted exploring low-quality changes).

It’d be nice to have a tool that essentially “multiplies” a small corpus (maybe even a single file) into a larger one. Doing so isn’t hard (mutate each file a couple of times, add each mutation to the corpus), but doing so effectively is very hard and probably requires significant research.

Idle thoughts:

Corpus generation can be thought of as a middle ground between fuzzing itself (since we’re mutating the inputs in ways we think are exploratory/interesting) and automated grammar/language extraction (since knowing the structure of our inputs gives us the best stance for expanding the corpus meaningfully).

My ideal interface:

corpus-buddy input-corpus/ output-corpus/
corpus-buddy input-sample.whatever output-corpus/

A Python AST differ and visualizer

Python 3’s AST is relatively stable, apart from newly introduced syntax (like the walrus operator in 3.8).

But sometimes it does change internally, in ways that break (or alter) the behavior of linting and static analysis tools like bellybutton. When this happens, it’s usually not a big deal to inspect the offending AST by hand and make corrections to the tool and/or corresponding rules. That being said, it’s a little tedious.

Consequently, it might be nice to have something in the spirit of Godbolt, but for Python: you could load up a fragment of Python and see how CPython (and other implementations) have changed their ASTs between versions.

Idle thoughts:

This could also extend to diffing the bytecode generated by different versions of Python, or just generally visualizing the bytecode.

My ideal interface: some sort of webpage with a SxS layout, allowing me to select different Python versions to generate ASTs for.

A universal pre- and post-hooking tool

It’s often useful to wrap command line (or graphical tools) in a surrounding script that performs additional setup, isolation, desktop notifications, or other operations.

Doing so by hand is a little bit tedious: you have to put your wrapper first in the $PATH, find the next executable in the $PATH after yourself, forward the arguments (or not), duplicate the inputs and outputs (or not), forward the return code (or not), and so forth. That’s not so bad when you have just one wrapper, but it’s annoying with many.

It’d be nice to have a tool that takes the tedium out of wrapping: you tell it to wrap foo, and it inserts a new foo shim into your path. Instead of modifying that shim yourself, you specify its behavior declaratively with configuration:

What it does before running the underlying foo, and after
What it does on particular conditions, e.g. the underlying foo exiting with an error
What it does with foo’s streams

…and so on.

My ideal interface:

# create a passthrough shim for foo
hook-tool new foo

# configure our `foo` shim
hook-tool conf foo

A cross-platform process singleton library

Okay, this one isn’t a tool, but it would still be nice. Creating cross-platform process singletons is currently a mess:

On Windows, you can create a Named Pipe, Named Mutex, or other unique, kernel-managed resource. Named objects are collected on exit if the exiting process was the only remaining holder, making them a tidy and self-cleaning singleton mechanism.
On macOS with Cocoa, you can do something ugly like searching through NSWorkspace.runningApplications. I’m pretty sure this doesn’t work for non-Cocoa (and especially non-graphical) applications.
On Linux, you can use UNIX sockets with the abstract namespace. Abstract-namespaced UNIX sockets don’t exist on disk (and thus can’t be accidentally deleted) and close on process exit, giving them singleton semantics close to those of Windows’ named objects.
On baseline UNIX-y platforms, the most universal “solutions” are either pidfiles or flock(2)ing some canonical file (or both). These are terrible for lots of reasons². There are probably better individual solutions on each of the BSDs, but no one (as far as I know) has gone to the trouble of abstracting them together.

A library that provides a uniform and minimal abstraction layer for creating process singletons would go a long way towards improving the quality of self-daemonizing and other background utilities.

A Graphviz layout comparison tool

Graphviz is a wonderful tool for drawing all sorts of graphs. Unfortunately, its default layout engine (dot) occasionally struggles with large and/or complex graphs (both in terms of runtime and in terms of visually pleasing renderings). Fixing this is normally a task of selecting a different layout engine (like neato), but selecting the correct engine requires a combination of information about the graph’s structure, density, size, and any runtime performance constraints.

To make things even more complex, most of layout engines take additional options or read optional graph attributes that can radically alter their layout and/or performance behavior. Keeping track of everything you’ve already tried while looking for a decent layout can be bewildering.

An “easy” fix for this would be a tool that displays a grid of potential renderings for the same input DOT file, each using different engines and/or options. Users could then flick through the engines and options to find an acceptable rendering and spit it out in their preferred format(s).

My ideal interface: another web page, with a grid layout like stated above. Maybe not the most flashy, but it would get the job done.

And sometimes even worse, since it wastes your time. ↩
These could occupy their own blog post, but in short: they’re error prone, easy to accidentally delete, and difficult to clean up properly. ↩

Discussions: Reddit

Previously

Newer