This is a series of posts on building imgplex, best read in order:
Part 1 - The why, what, and how of imgplex
Part 2 - Getting things up and running
Part 3 - The node definition system
Part 4 - Executing the node graph, making it fast
Part 5 - Two graphs in one
Part 6 - Multiple inputs and outputs, processing images as sets
Part 7 - The small, measured optimizations beneath the big ones
From graph to commands
With nodes defined as JSON and a working graph editor as described in part 3, the next challenge was the pipeline engine: taking a connected graph of nodes and turning it into actual ImageMagick operations applied to actual images.
The first step is always a topological sort. Before executing anything, the engine needs to know the correct order to process nodes - each node has to run after all of its inputs are ready. This is the same problem a shader graph or a Houdini network solves: evaluate the leaf nodes first and work toward the output. Kahn’s algorithm handles the ordering cleanly, and it catches cycles as a side effect - a feedback loop in the graph is rejected before any processing starts.
Once the order is established, the engine walks the sorted list and resolves each node’s parameters. If a parameter has a wire coming into it from another node, the upstream value wins. If not, the value from the Inspector is used. The pure-value nodes - floats, math, string constants - are evaluated first, so that by the time an image node runs, every one of its parameters is already a concrete value.
For image nodes, those resolved parameters are turned into ImageMagick arguments and
handed to a magick process. Image in, image out. That’s the whole loop.
The preview pipeline
The preview runs constantly - every parameter edit, every new connection, every time you click a different image in the filmstrip. The whole point is real-time feedback, so it has to be fast, and a few things make it fast.
First, it only does the work it needs to. The graph is trimmed to just the ancestors of the node you’re currently looking at - nothing downstream of the selected node, and nothing on an unrelated branch, gets evaluated. And it only runs on the single image you’ve got selected in the filmstrip, not the whole queue.
Second, the input is small. Rather than spawning a fresh process to produce a
preview-resolution copy of the source, the preview pipeline reuses the thumbnail that
was already generated when the image was imported - a WebP capped at 256px (user
configurable) on its longest edge. The WebP format was chosen for the thumbnail for
two reasons: it is very fast to write to, and even at 70–80% lossy compression it
looks almost the same as the source image. That thumbnail already exists in the
cache, so the preview starts from it directly and skips a magick spawn entirely on
every single preview cycle. ImageMagick operations on a 256px image are trivially
cheap compared to a 4K source, and for judging a brightness tweak or a crop the
result is visually identical to the full-res output.
On top of that, the preview caches each node’s output, keyed by a hash of that node’s inputs and parameters. When you change something, only that node and its actual downstream dependents get invalidated and re-run; everything upstream, and every unrelated branch, serves its cached result immediately. It’s the same principle as incremental compilation or a shader cache - never recompute what hasn’t changed. Tweak one node, only that node and the nodes that depend on it re-evaluate. There’s a 80ms debounce on top, so dragging a slider settles before it fires a run rather than launching a hundred of them a second.
The spawn-cost problem
Here’s the thing that shaped most of the batch engine’s design. On Windows, launching
a magick process costs a fixed overhead no matter how trivial the actual operation
is - and at scale that overhead dominates everything else. Run it once and you won’t
notice. Run it ten thousand times and it’s most of your runtime.
The naive approach (and the thing I did first) - one magick spawn per node, per
image - falls apart quickly. A five-node graph over 2000 images is 10,000 process
launches, and the spawn overhead alone would have the computer spending most of its
time starting and stopping processes before touching a single pixel. Getting rid of
that overhead took a few techniques stacked together.
The first is command fusion. ImageMagick can apply many operations in a single
invocation - a resize, then a brightness adjustment, then a format conversion, all in
one command. So the batch engine doesn’t spawn per node. It walks a chain of
consecutive standard operations and accumulates them lazily into one argument list,
only actually spawning magick when it hits something that forces a break: a branch
where the image feeds two consumers, or a format change. A long linear chain of nodes
collapses into a single process launch. Even channel splitting, which pulls the R, G,
B, and A channels out as separate images, is done in one magick call rather than
four.
The second is about the moments when a chain does have to break and write an intermediate file to disk. Those intermediates used to be written as PNG, which means every break point paid the cost of PNG-compressing the image on the way out and decompressing it on the way back in - pure overhead for a file that only exists for a few milliseconds. They’re now written as MIFF, ImageMagick’s own uncompressed native format, which skips the encode/decode entirely. Only the final output - the thing the user actually keeps - gets encoded to the real target format.
The third is parallelism. Instead of processing images one at a time, the batch runs several concurrently, with the worker count derived from the CPU core count. Each worker pulls the next image off the queue and processes it independently. The one subtlety here is that ImageMagick has its own internal multithreading, so if you run N workers and let each one spin up a full thread pool, you oversubscribe the CPU and everything gets slower. The fix is to divide ImageMagick’s thread budget across the workers so the total stays sensible.
Fast path, slow path
There’s one more optimization worth explaining, because it’s a nice example of letting the graph’s shape drive the strategy.
Most of the time, the operations applied to every image are identical - the same resize, the same adjustment, the same conversion. In that case the engine builds the operation plan once and reuses it verbatim for every image in the batch. That’s the fast path, and it skips per-image parameter evaluation entirely.
But some nodes read facts about the specific image they’re processing. The moment a graph contains one of those Properties nodes, the shared plan can’t be reused, because the plan genuinely differs per image. So the engine detects that case and switches to a slow path where it re-evaluates per image.
The interesting part is being precise about what “reading a fact about the image”
actually costs. Some facts - the filename, the path, the file size - come straight
from the filesystem and don’t require decoding the image at all. Others - width,
height, bit depth - need ImageMagick to actually open the pixels. Early on, the
file-size node was mistakenly flagged as needing full image metadata, which meant
every image in a batch spawned a pair of magick identify processes just to answer a
question the operating system already knew. Fixing that one flag took a 3,720-image
batch from around two minutes down to about two-tenths of a second. The lesson
generalizes: the engine should only pay for the information a node truly needs, and a
surprising amount of what looks like image metadata is really just filesystem
metadata wearing a costume.
Import performance
Loading a big folder into the filmstrip had the exact same spawn-cost problem -
generating a thumbnail and reading dimensions for every image, one magick spawn at
a time, is painfully slow for a folder of any real size. The fix had a few parts.
For the common formats - PNG, JPEG, BMP, WebP, TGA - the image dimensions can be read straight out of the file header in a handful of bytes, with no process spawn at all. That covers most of the texture formats that show up in a game dev folder.
Thumbnails are generated in batches: a single magick invocation produces thumbnails
for eight images at once, so the launch cost is paid off across the group instead of
paid per image. The thumbnails themselves are WebP, which keeps them small on disk
and in memory. Import also runs with concurrent workers, same idea as the batch
pipeline. And every generated thumbnail is cached to disk, keyed by the source path
and its modification time, so re-importing the same folder in a later session skips
the work entirely and loads straight from cache.
Together these took importing ~2000 mixed PNG/TGA/JPG/PSD images from around 44 seconds down to about 3 seconds - roughly a 15× speedup. The difference between that being an interruption and it being instant is the difference between a tool you reach for and one you avoid.
Non-fatal batch errors
One deliberate design decision: errors during a batch are non-fatal. If ImageMagick chokes on one image - a corrupt file, an unusual format edge case, a weird character in a path - the batch keeps going. The failure is recorded and shown in a summary dialog at the end, alongside the counts of images processed and skipped, and written to a timestamped log next to the output so there’s a permanent record of exactly what happened.
This is just how you’d want a batch tool to behave. Halting a 2000-image run because one file had a problem would be maddening. The summary gives you enough to go and investigate the failures afterward without ever interrupting the work that succeeded.
Next post in this series: Building imgplex: part 5 - Two graphs in one