- dynamic allocations could generate
bad 'and' instructions (for the
and with -16 in salloc()).
- symbols used in w context would
generate adrp and add instructions
on wN registers while they seem to
only work on xN registers.
Thanks to Rosie for reporting them.
Functions are now aligned on 16-byte
boundaries. This mimics gcc and should
help reduce the maximum perf impact of
cosmetic code changes. Previously, any
change in the output of qbe could have
far reaching implications on alignment.
Thanks to Roland Paterson-Jones for
pointing out the variability issue.
Passes the "standard" test suite.
(cproc bootstrap, hare[c] make test, roland units, linpack/coremark run)
However linpack benchmark is now notably slower. Coremark is ~2% faster.
As noticed before, linmark timing is dubious, and maybe my cheap (AMD) laptop
prefers mul to shl.
when applying a custom set of CFLAGS under clang that does not include
-std=c99, asm is treated as a keyword and as such can not be used as an
identifier. this prevents the issue by renaming the offending variables.
Comparisons return a 1-bit value, in theory
we could add a Wu1 width for them but I did
not bother and just used Wub. This simply
means that if a frontend generates an extsb
of a comparison result (silly), we will not
generate good code.
This significantly improves parsing performance for massive functions
with a huge number of temporaries. Parsing the 86MiB IL produced
by cproc during zig bootstrap drops from 17m15s to 2.5s (over 400x
speedup).
The speedup is much smaller for IL produced from normal non-autogenerated
C code. Parsing the sqlite3 amalgamation drops from 0.40s to 0.33s.
The algorithm to generate matchers
took a long time to be discovered
and refined to its present version.
The rest of mgen is mostly boring
engineering.
Extensive fuzzing ensures that the
two core components of mgen (tables
and matchers generation) are correct
on specific problem instances.
The initial plan was to have one
matcher per ac-variant, but that
leads to way too much generated
code. Instead, we can fuse ac
variants of the rules and have
a smarter matching algorithm to
recover bound variables.
I noticed that my compiler was generating redundant blits, and after
looking through the QBE debug output I believe that I found some low
hanging fruit to help clean them up.
I'm new to this codebase, so please treat this patch with a lot of
skepticism. Happy to make any changes.
Thanks for reviewing, and thank you for QBE!
In C, if a floating point cannot be represented exactly as an integer,
conversion from the former to the latter is implementation-defined.
Therefore, it can be flaky to test this against QBE-defined behavior.
This was discovered from (unsigned int) 4294967295.0f being an UB,
because (uint64_t) 4294967295.0f is 4294967296 > UINT_MAX
on amd64 when compiled by either gcc or clang.
The handling of phi was incorrect
and we would sometimes miss escapes.
We now handle phis at the very end
of the pass to make sure the defs
for their arguments have all been
processed.