Ninja File Canonicalizer

Suppose we have a tool that generates a Ninja file from some other description (think Kati and makefiles), and during the testing we discovered a regression. Furthermore, suppose that the generated Ninja file is large (think millions of lines). And, the new Ninja file has build statements and rules in a slightly different order. As the tool generates the rule names, the real differences in the output of the diff command are drowned in noise. Enter Canoninja.

Canoninja renames each Ninja rule to the hash of its contents. After that, we can just sort the build statements, and a simple comm command immediately reveal the essential difference between the files.

Example

Consider the following makefile

second :=
first: foo
foo:
	@echo foo
second: bar
bar:
	@echo bar

Depending on Kati version converting it to Ninja file will yield either:

$ cat /tmp/1.ninja
# Generated by kati 06f2569b2d16628608c000a76e3d495a5a5528cb

pool local_pool
 depth = 72

build _kati_always_build_: phony

build first: phony foo
rule rule0
 description = build $out
 command = /bin/sh -c "echo foo"
build foo: rule0
build second: phony bar
rule rule1
 description = build $out
 command = /bin/sh -c "echo bar"
build bar: rule1

default first

$ cat 2.ninja
# Generated by kati 371194da71b3e191fea6f2ccceb7b061bd0de310

pool local_pool
 depth = 72

build _kati_always_build_: phony

build second: phony bar
rule rule0
 description = build $out
 command = /bin/sh -c "echo bar"
build bar: rule0
build first: phony foo
rule rule1
 description = build $out
 command = /bin/sh -c "echo foo"
build foo: rule1

default first

This is a quirk in Kati, see https://github.com/google/kati/issues/238

Trying to find out the difference between the targets even after sorting them isn't too helpful:

diff <(grep '^build' /tmp/1.ninja|sort) <(grep '^build' /tmp/2.ninja | sort)
1c1
< build bar: rule1
---
> build bar: rule0
3c3
< build foo: rule0
---
> build foo: rule1

However, running these files through canoninja yields

$ canoninja /tmp/1.ninja
# Generated by kati 06f2569b2d16628608c000a76e3d495a5a5528cb

pool local_pool
 depth = 72

build _kati_always_build_: phony

build first: phony foo
rule R2f9981d3c152fc255370dc67028244f7bed72a03
 description = build $out
 command = /bin/sh -c "echo foo"
build foo: R2f9981d3c152fc255370dc67028244f7bed72a03
build second: phony bar
rule R62640f3f9095cf2da5b9d9e2a82f746cc710c94c
 description = build $out
 command = /bin/sh -c "echo bar"
build bar: R62640f3f9095cf2da5b9d9e2a82f746cc710c94c

default first

and

~/go/bin/canoninja /tmp/2.ninja
# Generated by kati 371194da71b3e191fea6f2ccceb7b061bd0de310

pool local_pool
 depth = 72

build _kati_always_build_: phony

build second: phony bar
rule R62640f3f9095cf2da5b9d9e2a82f746cc710c94c
 description = build $out
 command = /bin/sh -c "echo bar"
build bar: R62640f3f9095cf2da5b9d9e2a82f746cc710c94c
build first: phony foo
rule R2f9981d3c152fc255370dc67028244f7bed72a03
 description = build $out
 command = /bin/sh -c "echo foo"
build foo: R2f9981d3c152fc255370dc67028244f7bed72a03

default first

and when we extract only build statements and sort them, we see that both Ninja files define the same graph:

$ diff <(~/go/bin/canoninja /tmp/1.ninja | grep '^build' | sort) \
       <(~/go/bin/canoninja /tmp/2.ninja | grep '^build' | sort)

Todo

Optionally output only the build statements, optionally sorted
Handle continuation lines correctly