Commit graph

72 commits

Author SHA1 Message Date
Quentin Carbonneaux
90050202f5 fix various codegen bugs on arm64
- dynamic allocations could generate
  bad 'and' instructions (for the
  and with -16 in salloc()).
- symbols used in w context would
  generate adrp and add instructions
  on wN registers while they seem to
  only work on xN registers.

Thanks to Rosie for reporting them.
2024-10-01 19:42:50 +02:00
Alexey Yerin
bb8de8c633 arm64/isel: Avoid signed overflow when handling immediates
Clang incorrectly optimizes this negation with -O2 and causes QBE to
emit 0 in place of INT64_MIN.
2024-08-15 23:21:05 +02:00
Quentin Carbonneaux
ddf5ced4a7 revert 4bc4c958
Hopefully the right time now!
2024-06-16 22:26:51 +02:00
Erica Z
c8220b638b replace asm keyword
when applying a custom set of CFLAGS under clang that does not include
-std=c99, asm is treated as a keyword and as such can not be used as an
identifier. this prevents the issue by renaming the offending variables.
2024-05-28 10:39:41 +02:00
Quentin Carbonneaux
b24af7d3f7 revert 1b7770e271
Quotes are used on Apple target
variants to flag that we must
not add the _ symbol prefix.
2024-04-22 14:01:50 +02:00
Michael Forney
1b7770e271 Drop quotes around floating point constant labels
This is incompatible with binutils gas older than 2.26.
2024-03-26 09:22:06 +01:00
Drew DeVault
85287081c4 dbgloc: add column argument
dbgloc line [col]

This is implemented in a backwards-compatible manner.
2024-01-02 12:12:05 +01:00
Quentin Carbonneaux
4bc4c9584a revert 5af33410
Causes errors with stock toolchain
on OpenBSD.
2024-01-02 11:16:08 +01:00
Tobias Heider
5af33410f6 Fix IBT/BTI by instrumenting function calls 2023-12-30 15:59:25 +01:00
Quentin Carbonneaux
36946a5142 file,loc become dbgfile,dbgloc 2023-08-18 15:12:56 +02:00
Thomas Bracht Laumann Jespersen
0d929287d7 implement line number info tracking
Support "file" and "loc" directives. "file" takes a string (a file name)
assigns it a number, sets the current file to that number and records
the string for later. "loc" takes a single number and outputs location
information with a reference to the current file.
2023-06-06 18:44:51 +02:00
Quentin Carbonneaux
50452b88e9 fix sub-word returns on arm64_apple 2023-05-09 12:39:51 +02:00
Quentin Carbonneaux
5fee3da6ac rename blknew() to newblk()
This is consistent with newtmp()
and newcon().
2023-03-22 11:43:46 +01:00
Quentin Carbonneaux
eb9fcece9e naming nit 2023-03-19 07:38:24 +01:00
Quentin Carbonneaux
011dfc839d silence format warning more reliably 2023-03-16 16:22:11 +01:00
Quentin Carbonneaux
6f45894c7f silence some warnings 2023-03-15 22:07:18 +01:00
Alexey Yerin
7410f90629 Emit .type and .size directives on RISC-V and ARM
To match x86
2023-03-11 21:56:45 +01:00
Quentin Carbonneaux
26c1c30b7d new blit instruction 2022-12-14 23:18:26 +01:00
Quentin Carbonneaux
c0f25aeae3 new rsval() helper for signed Refs
The .val field is signed in RSlot.
Add a new dedicated function to
fetch it as a signed int.
2022-12-12 22:16:33 +01:00
Quentin Carbonneaux
9126afa2da new hlt block terminator
It is handy to express when
the end of a block cannot be
reached. If a hlt terminator
is executed, it traps the
program.

We don't go the llvm way and
specify execution semantics as
undefined behavior.
2022-11-27 21:48:21 +01:00
Quentin Carbonneaux
cbee74bdb4 use a new struct for symbols
Symbols are a useful abstraction
that occurs in both Con and Alias.
In this patch they get their own
struct. This new struct packages
a symbol name and a type; the type
tells us where the symbol name
must be interpreted (currently, in
gobal memory or in thread-local
storage).

The refactor fixed a bug in
addcon(), proving the value of
packaging symbol names with their
type.
2022-11-22 21:56:21 +01:00
Quentin Carbonneaux
8ecae92299 thread-local storage for amd64_apple
It is quite similar to arm64_apple.
Probably, the call that needs to be
generated also provides extra
invariants on top of the regular
abi, but I have not checked that.

Clang generates code that is a bit
neater than qbe's because, on x86,
a load can be fused in a call
instruction! We do not bother with
supporting these since we expect
only sporadic use of the feature.

For reference, here is what clang
might output for a store to the
second entry of a thread-local
array of ints:

        movq    _x@TLVP(%rip), %rdi
        callq   *(%rdi)
        movl    %ecx, 4(%rax)
2022-10-12 21:12:08 +02:00
Quentin Carbonneaux
577e93fe6d thread-local storage for arm64_apple
It is documented nowhere how this is
supposed to work. It is also quite easy
to have assertion failures pop in the
linker when generating asm slightly
different from clang's!

The best source of information is found
in LLVM's source code (AArch64ISelLowering.cpp).
I paste it here for future reference:

/// Darwin only has one TLS scheme which must be capable of dealing with the
/// fully general situation, in the worst case. This means:
///     + "extern __thread" declaration.
///     + Defined in a possibly unknown dynamic library.
///
/// The general system is that each __thread variable has a [3 x i64] descriptor
/// which contains information used by the runtime to calculate the address. The
/// only part of this the compiler needs to know about is the first xword, which
/// contains a function pointer that must be called with the address of the
/// entire descriptor in "x0".
///
/// Since this descriptor may be in a different unit, in general even the
/// descriptor must be accessed via an indirect load. The "ideal" code sequence
/// is:
///     adrp x0, _var@TLVPPAGE
///     ldr x0, [x0, _var@TLVPPAGEOFF]   ; x0 now contains address of descriptor
///     ldr x1, [x0]                     ; x1 contains 1st entry of descriptor,
///                                      ; the function pointer
///     blr x1                           ; Uses descriptor address in x0
///     ; Address of _var is now in x0.
///
/// If the address of _var's descriptor *is* known to the linker, then it can
/// change the first "ldr" instruction to an appropriate "add x0, x0, #imm" for
/// a slight efficiency gain.

The call 'blr x1' above is actually
special in that it trashes less registers
than what the abi would normally permit.
In qbe, I don't take advantage of this
and lower the call like a regular call.
We can revise this later on. Again, the
source for this information is LLVM's
source code:

// TLS calls preserve all registers except those that absolutely must be
// trashed: X0 (it takes an argument), LR (it's a call) and NZCV (let's not be
// silly).
2022-10-12 21:11:41 +02:00
Quentin Carbonneaux
b03a8970d7 mark apple targets with a boolean
It is more natural to branch on a
flag than have different function
pointers for high-level passes.
2022-10-08 21:48:47 +02:00
Quentin Carbonneaux
4e90b4210e "rel" fields become "reloc" 2022-10-08 21:48:47 +02:00
Quentin Carbonneaux
00a30954ac add support for thread-local storage
The apple targets are not done yet.
2022-10-08 21:48:42 +02:00
Quentin Carbonneaux
70f297bab7 fix case of Pool constants 2022-10-03 10:41:30 +02:00
Quentin Carbonneaux
79f3673d20 new arm64_apple target
Should make qbe work on apple
arm-based hardware.
2022-10-03 10:41:26 +02:00
Quentin Carbonneaux
a9a70e30a8 add new target-specific abi0 pass
The general idea is to give abis a
chance to talk before we've done all
the optimizations. Currently, all
targets eliminate {par,arg,ret}{sb,ub,...}
during this pass. The forthcoming
arm64_apple will, however, insert
proper extensions during abi0.

Moving forward abis can, for example,
lower small-aggregates passing there
so that memory optimizations can
interact better with function calls.
2022-10-03 10:41:03 +02:00
Quentin Carbonneaux
fb76791b97 remove two unsigned
We have a uint alias that we use
everywhere else. I also added a
todo about unhandled large offsets
in arm64/emit.
2022-09-01 19:08:38 +02:00
Quentin Carbonneaux
f135a0b1fd use direct bl calls on arm64
This generates tidier code and is pic
friendly because it lets the linker
trampoline calls to dynlinked libs.
2022-09-01 19:03:53 +02:00
Quentin Carbonneaux
8dddb971d9 drop -G flag and add target amd64_apple
apple support is more than assembly syntax
in case of arm64 machines, and apple syntax
is currently useless in all cases but amd64;
rather than having a -G option that only
makes sense with amd64, we add a new target
amd64_apple
2022-08-31 21:42:49 +02:00
Michael Forney
4ac7d770d6 arm64: fix maximum immediate size for small loads/stores
The maximum immediate size for 1, 2, 4, and 8 byte loads/stores is
4095, 8190, 16380, and 32760 respectively[0][1][2].

[0] https://developer.arm.com/documentation/dui0802/a/A64-Data-Transfer-Instructions/LDRB--immediate-
[1] https://developer.arm.com/documentation/dui0802/a/A64-Data-Transfer-Instructions/LDRH--immediate-
[2] https://developer.arm.com/documentation/dui0802/a/A64-Data-Transfer-Instructions/LDR--immediate-
2022-05-10 11:51:47 +02:00
Quentin Carbonneaux
bf2a90ef7c fix return for big aggregates
The recent changes in arm and riscv
typclass() set ngp to 1 when a struct
is returned via a caller-provided
buffer.  This interacts bogusly with
selret() that ends up declaring a gp
register live when none is set in
the returning sequence.

The fix is simply to set cty to zero
(all registers dead) in case a caller-
provided buffer is used.
2022-03-17 10:57:09 +01:00
Quentin Carbonneaux
2416d29141 new -t? flag to print default target 2022-03-15 22:30:34 +01:00
Quentin Carbonneaux
01142fa059 support env calls on arm64
The x9 register is used for
the env parameter.
2022-03-15 14:18:31 +01:00
Quentin Carbonneaux
c5769f62b4 dynamic stack allocs for arm64
I also moved some isel logic
that would have been repeated
a third time in util.c.
2022-03-14 23:14:48 +01:00
Quentin Carbonneaux
7a7a5f4803 improve consistency in abis 2022-03-14 10:40:30 +01:00
Quentin Carbonneaux
905e9cef30 arm64/abi: fix big aggregates passed on the stack
The riscv test abi8.ssa caught a bug
in the arm backend. It turns out we
were using the wrong class when loading
pointers to aggregates from the stack.

The fix is simple and mirrors what is
done in the riscv abi.
2022-03-14 10:04:24 +01:00
Quentin Carbonneaux
9060981c10 flag types defined as unions
The risc-v abi needs to know if a
type is defined as a union or not.

We cannot use nunion to obtain this
information because the risc-v abi
made the unfortunate decision of
treating

	union { int i; }

differently from

	int i;

So, instead, I introduce a single
bit flag 'isunion'.
2022-03-08 15:57:06 +01:00
Quentin Carbonneaux
349794f3e4 cosmetics 2022-03-08 15:36:26 +01:00
Quentin Carbonneaux
42cbdc04d0 improve consistency in arm64 and rv64 abis 2022-02-25 10:49:55 +01:00
Quentin Carbonneaux
2ca6fb25a2 shared linkage logic for func/data 2022-02-02 21:09:37 +01:00
Quentin Carbonneaux
20ee522ce8 arm64: handle large slots in Ocopy 2022-01-31 16:57:22 +01:00
Bor Grošelj Simić
3964574a83 implement float -> unsigned casts
amd64 lacks instruction for this so it has to be implemented with
float -> signed casts. The approach is borrowed from llvm.
2022-01-28 09:24:15 +01:00
Bor Grošelj Simić
74d022f975 implement unsigned -> float casts
amd64 lacks an instruction for this so it has to be implemented with
signed -> float casts:
 - Word casting is done by zero-extending the word to a long and then doing
   a regular signed cast.
 - Long casting is done by dividing by two with correct rounding if the
   highest bit is set and casting that to float, then adding
   1 to mantissa with integer addition
2022-01-28 09:24:15 +01:00
Eyal Sawady
e91d121581 Add a negation instruction
Necessary for floating-point negation, because
`%result = sub 0, %operand` doesn't give the correct sign for 0/-0.
2022-01-23 11:43:59 +01:00
Quentin Carbonneaux
367c8215d9 arm64: fix slots with offset >32k
When slots are used with a large offset,
the emitter generates invalid assembly
code. That is caught later on by the
assembler, but it prevents compilation
of programs with large stack frames.

When a slot offset is too large to be
expressed as a constant offset to x29
(the frame pointer), emitins() inserts
a late Oaddr instruction to x16 and
replaces the large slot reference with
x16.

This change also gave me the opportunity
to refactor the save/restore logic for
callee-save registers.

This fixes the following Hare issue:
https://todo.sr.ht/~sircmpwn/hare/387
2021-12-05 22:06:23 +01:00
Michael Forney
ae8803cbe6 amd64: avoid reading past end of passed struct
If the size of the struct is not a multiple of 8, the actual struct
size may be different from the size reserved on the stack.

This fixes the case where the struct is passed in memory, but we
still may over-read a struct passed in registers. A TODO is added
for now.
2021-11-08 14:11:10 +01:00
Quentin Carbonneaux
cd095a44db fix for sloppy reg->mem in arm64 abi
Michael found a bug where some copies
from registers to memory in the arm64
abi clobber the stack. The test case
is:

    type :T = { w }
    function w $f() {
    @start
    	%p =:T call $g()
    	%x =w loadw %p
    	ret %x
    }

qbe will write 4 bytes out of bounds
when pulling the result struct from
its register. The same bug can be
observed if :T's definition is {w 3};
in this case qbe writes 16 bytes in
a slot of 12 bytes.

This patch changes stkblob() to use
the rounded argument size if it is
going to be restored from registers.

Relatedly, mem->reg loads for structs
with size < 16 and != 8, are treated
a bit sloppily both in the arm64 and
in the sysv abis. That is much less
harmful than the present bug.
2021-11-08 11:29:36 +01:00