Support "file" and "loc" directives. "file" takes a string (a file name)
assigns it a number, sets the current file to that number and records
the string for later. "loc" takes a single number and outputs location
information with a reference to the current file.
During coalescing, the resizing/
reordering of the sl[] array
invalidates the indices stored
in the 'visit' field of temps;
we need to reset it before we
can use it again.
Crashing loads of uninitialized memory
proved to be a problem when implementing
unions using qbe. This patch introduces
a new UNDEF Ref to represent data that is
known to be uninitialized. Optimization
passes can make use of it to eliminate
some code. In the last compilation stages,
UNDEF is treated as the constant 0xdeaddead.
When multiple stack slots are coalesced
one 'alloc' instruction is kept in the il
and the other ones are removed and have
their uses replaced by the result of the
selected one. To produce valid ssa, it
must be ensured that the uses that get
replaced are dominated by the selected
'alloc' instruction. This patch ensures
dominance by moving the selected alloc up
in the start block as necessary.
It is handy to express when
the end of a block cannot be
reached. If a hlt terminator
is executed, it traps the
program.
We don't go the llvm way and
specify execution semantics as
undefined behavior.
Symbols are a useful abstraction
that occurs in both Con and Alias.
In this patch they get their own
struct. This new struct packages
a symbol name and a type; the type
tells us where the symbol name
must be interpreted (currently, in
gobal memory or in thread-local
storage).
The refactor fixed a bug in
addcon(), proving the value of
packaging symbol names with their
type.
When we process one block, we
start by allocating registers
for all the temporaries live
at the exit of the block.
Before this patch we processed
temps first, then in doblk() we
would mark globally live registers
allocated. This meant that temps
could get wrongly assigned a live
register.
The fix is simple: we now process
registers first at block exits,
then allocate temps.
The copy elimination pass is not
complete. This patch improves
things a bit, but I think we still
have quite a bit of incompleteness.
We now consistently mark phis with
all arguments identical as copies.
Previously, they were inconsistently
eliminated by phisimpl(). An example
where they were not eliminated is
the following:
@blk2
%a = phi @blk0 %x, @blk1 %x
jnz ?, @blk3, @blk4
@blk3
%b = copy %x
@blk4
%c = phi @blk2 %a, @blk3 %b
In this example, neither %c nor %a
were marked as copies of %x because,
when phisimpl() is called, the copy
information for %b is not available.
The incompleteness is still present
and can be observed by modifying
the example above so that %a takes
a copy of %x through a back-edge.
Then, phisimpl()'s lack of copy
information about %b will prevent
optimization.
This pass limits stack usage when
many small aggregates are allocated
on the stack. A fast liveness
analysis figures out which slots
interfere and the pass then fuses
slots that do not interfere. The
pass also kills stack slots that
are only ever assigned.
On the hare stdlib test suite, this
fusion pass managed to reduce the
total eligible slot bytes count
by 84%.
The slots considered for fusion
must not escape and not exceed
64 bytes in size.