Rudimentary RISC-V support#704
Conversation
af354c2 to
33dc1c4
Compare
51cec8e to
b72994e
Compare
davidlattimore
left a comment
There was a problem hiding this comment.
Looks good. Nice work! A few small comments. I'd say don't spend too long on any of them - if any look like they might take too long, then we can leave them until after merging.
| // The iterator is used for e.g. R_RISCV_PCREL_HI20 & R_RISCV_PCREL_LO12_I pair of relocations where the later | ||
| // one actually points to a label of the HI20 relocations and thus we need to find it. The relocation is typically | ||
| // right before the LO12_* relocation. | ||
| let relocations_to_search = relocations[..i] |
There was a problem hiding this comment.
When I benchmark a build of this PR against main, I see about a 6% increase in instruction count. The effect on walltime is less (about 2.9-4.6%). I'm not sure what the cause is. One possibility is just from having more branches in some of our large match statements. If that's the case, then it's likely hard to do much about at present. The other possibility is some bit of code in a common path like this code here.
Do you see a small slow down too?
If it is this bit of code here - perhaps unlikely, but possible - then the two options that come to mind are (a) pass a closure that produces this iterator or (b) pass relocations and i.
There was a problem hiding this comment.
Do you see a small slow down too?
No, I've just tried building mold both w/ and w/o debug info, and the speed might be 1% slower now, not more.
If it is this bit of code here - perhaps unlikely, but possible - then the two options that come to mind are (a) pass a closure that produces this iterator or (b) pass
relocationsandi.
Yeah, I can experiment with that. It's true we currently build the iterator for every relocation, where in most cases, it's unused.
There was a problem hiding this comment.
@davidlattimore I reworked the logic around relocations_to_search, can you re-measure the numbers after the change?
There was a problem hiding this comment.
That helped a bit. The instruction count (which is very stable from run to run) now only increases by 5.4% as opposed to 6.7%.
Do you see any significant change on instruction counts? I'm using https://github.com/andrewrk/poop to profile.
There was a problem hiding this comment.
Yesterday, I spent some time with the bisection and I can confirm the instruction count increase with this change, where I managed to identify 2 steps of the growth:
- Addition of handling of
AbsoluteAddition/AbsoluteSubtraction: 29.6G -> 30.5G insns - Addition of
get_pair_subtraction_relocation_valuetoapply_debug_relocation: 31.7G -> 32.1 insns
Which is kind of expected, given the logic was added to the hottest functions. Moreover, I noticed the switch from anyhow to the custom error time increased the instruction count by ~0.7G (noticed that because I started merging main branch at some point).
8c14b52 to
169c737
Compare
9271fb0 to
6621842
Compare
|
I'm happy to announce the current RISC-V branch can bootstrap @davidlattimore I think the PR could be merged now, even though there are some refactoring opportunities. I'm planning to collect these as follow-up issues. Please take a look at the changes since you reviewed the patch set last time. One interesting change I had to include is 6893073, where we were missing emission of the TLSMOD (that was allocated) for an executable, where it references symbols from a dynamic library. Other archs probably don't suffer from it as they process a GD -> IE relaxation as the shared library presence is marked with |
Awesome!
Thanks for pointing out that change. That change looks right to me. I think you're right about initial-exec relaxations being the reason we hadn't noticed that it was wrong. |
Implements the very basic support for RISC-V architecture. Right now, I'm going to open a draft PR to get CI run results available with QEMU.
#678