Skip to content

CREL integration#981

Merged
marxin merged 17 commits intowild-linker:mainfrom
marxin:crel-integration
Aug 13, 2025
Merged

CREL integration#981
marxin merged 17 commits intowild-linker:mainfrom
marxin:crel-integration

Conversation

@marxin
Copy link
Copy Markdown
Collaborator

@marxin marxin commented Jul 2, 2025

This is work-in-progress integration of the support for the CREL relocations. So far, it depends on not-yet-published revision of the object crate.

The performance numbers for the linking Clang:

❯ poop 'bash ./run-with -fuse-ld=/tmp/ld' 'bash ./run-with -fuse-ld=/home/marxin/Programming/wild/ld' -d 2000
Benchmark 1 (3 runs): bash ./run-with -fuse-ld=/tmp/ld
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           984ms ± 1.39ms     982ms …  985ms          0 ( 0%)        0%
  peak_rss            115MB ± 76.1KB     115MB …  115MB          0 ( 0%)        0%
  cpu_cycles         49.8G  ± 1.53G     48.0G  … 50.8G           0 ( 0%)        0%
  instructions       37.2G  ± 8.66M     37.2G  … 37.2G           0 ( 0%)        0%
  cache_references   1.06G  ± 7.59M     1.05G  … 1.06G           0 ( 0%)        0%
  cache_misses        208M  ± 1.78M      206M  …  210M           0 ( 0%)        0%
  branch_misses      85.4M  ±  909K     84.3M  … 86.0M           0 ( 0%)        0%
Benchmark 2 (3 runs): bash ./run-with -fuse-ld=/home/marxin/Programming/wild/ld
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           988ms ± 12.2ms     974ms …  996ms          0 ( 0%)          +  0.4% ±  2.0%
  peak_rss            115MB ±  158KB     115MB …  115MB          0 ( 0%)          +  0.0% ±  0.2%
  cpu_cycles         52.1G  ±  542M     51.5G  … 52.6G           0 ( 0%)          +  4.7% ±  5.2%
  instructions       37.7G  ± 8.25M     37.7G  … 37.7G           0 ( 0%)        💩+  1.4% ±  0.1%
  cache_references   1.11G  ± 6.72M     1.10G  … 1.12G           0 ( 0%)        💩+  5.3% ±  1.5%
  cache_misses        212M  ± 1.16M      210M  …  213M           0 ( 0%)          +  1.6% ±  1.6%
  branch_misses      86.1M  ±  729K     85.3M  … 86.7M           0 ( 0%)          +  0.8% ±  2.2%

@marxin marxin requested a review from davidlattimore July 2, 2025 16:35
Comment thread libwild/src/elf.rs Outdated
) -> Result<&'data [Rela]> {
let Some(rela_index) = relocations.get(index) else {
return Ok(&[]);
) -> Result<Vec<Crel>> {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing to a heap allocation does make me worry a little bit about performance. Your benchmark looks good, but from the timing, I'm guessing it had debug info? What about a build with --strip-debug?

I tried running a few benchmarks myself. For non-debug builds, I do see a small slowdown. It's most noticeable in the instruction and cycle counts (+3.5% and +4.7%), but if I manage to get a quiet run, then I'll often also see a wall time slowdown of a few percent. I was mostly benchmarking wild linking itself without debug info. Running with --threads=1 or =2 can also show performance effects that aren't so visible when you have lots of threads.

I can't be sure that it's due to this heap allocation, but it does seem like the most likely cause.

Making it not heap-allocate when the object file is using CREL, while not impossible, would likely be tricky - especially for the eh_frame stuff. e.g. it might require making CrelIterator be Copy or at least Clone, or better yet having some way to save a compact form of the iterator and restore it later.

Making it not heap allocate when the relocations are RELA is likely also a bit of work, but more tractable. There are probably a few ways it could be done. The simplest that comes to mind would be to have an enum:

enum RelocationList<'data> {
   Rela(&'data [Rela]),
   Crel(Vec<Crel>),
}

This could then implement IntoIterator where the item of the iterator is a Crel. For the eh_frame stuff, we could have a method that returns a new RelocationList that held either a slice of the original Rela slice, or a new Vec in the case of Crel.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for the ideas! I've made some progress with that, but still it's non-trivial to handle the apply_relocation and apply_debug_relocation where we currently require full slice of relocations (and the index). We might want to make it Arch-specific, because it's only the riscv64 platform that relies on the information.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reworked most of the places, the only remaining is the handling of eh_frame, I would be curious if you see an opportunity for the improvement. Plus, could you please share your numbers after my recent changes? Much are we still behind the current main branch?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't had a chance to look through your recent changes yet, but here is a benchmark for how it currently looks on my machine.

❯ OUT=/home/david/ttt throttle-count poop '/home/david/tmp/rustc-link/0/run-with /home/david/tmp/wild-builds/2025-07-19 --strip-debug --no-fork --threads=2' '/home/david/tmp/rustc-link/0/run-with target/release/wild --strip-debug --no-fork --threads=2'
temp: +49.0°C
Benchmark 1 (7 runs): /home/david/tmp/rustc-link/0/run-with /home/david/tmp/wild-builds/2025-07-19 --strip-debug --no-fork --threads=2
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           779ms ± 8.62ms     770ms …  793ms          0 ( 0%)        0%
  peak_rss           1.36GB ±  128KB    1.36GB … 1.36GB          0 ( 0%)        0%
  cpu_cycles         3.68G  ± 25.9M     3.63G  … 3.71G           0 ( 0%)        0%
  instructions       5.25G  ±  301K     5.25G  … 5.25G           0 ( 0%)        0%
  cache_references    120M  ±  923K      118M  …  121M           0 ( 0%)        0%
  cache_misses       70.4M  ±  368K     69.9M  … 70.8M           0 ( 0%)        0%
  branch_misses      12.4M  ± 20.4K     12.4M  … 12.5M           0 ( 0%)        0%
Benchmark 2 (7 runs): /home/david/tmp/rustc-link/0/run-with target/release/wild --strip-debug --no-fork --threads=2
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           795ms ± 4.32ms     789ms …  802ms          0 ( 0%)        💩+  2.1% ±  1.0%
  peak_rss           1.37GB ±  218KB    1.37GB … 1.37GB          0 ( 0%)          +  0.8% ±  0.0%
  cpu_cycles         3.86G  ± 11.3M     3.84G  … 3.88G           0 ( 0%)        💩+  5.0% ±  0.6%
  instructions       5.68G  ±  232K     5.68G  … 5.68G           0 ( 0%)        💩+  8.2% ±  0.0%
  cache_references    124M  ±  839K      122M  …  125M           0 ( 0%)        💩+  3.2% ±  0.9%
  cache_misses       71.9M  ±  248K     71.6M  … 72.3M           0 ( 0%)        💩+  2.2% ±  0.5%
  branch_misses      12.4M  ± 11.6K     12.4M  … 12.4M           0 ( 0%)          -  0.3% ±  0.2%
Throttle pkg: 0 core: 0 ms: 0 temp: +56.0°C

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than caching a Vec<Crel> for both RELA and CREL, we could store Vec<Crel> only when CREL is used and otherwise store a &[Rela]. We could provide a function to get a Crel at an arbitrary index. This would eliminate the heap allocation for the RELA case. There might still be some overhead from switching between the different modes.

Alternatively, we could make the code that deals with relocations generic over the relocation type. That would mean that switching between RELA and CREL would be done in a less hot function. With that sort of approach, I'd expect basically no slowdown.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All right, now I can reproduce a very similar poop stats for the rustc linking at my machine. And yes, that's an unpleasant slowdown we should deal before this can get merged.

Alternatively, we could make the code that deals with relocations generic over the relocation type. That would mean that switching between RELA and CREL would be done in a less hot function. With that sort of approach, I'd expect basically no slowdown.

Are you talking about the added RelocationCache? Or do you mean the process_eh_frame_data function, where I currently recreate the ExceptionFrame::relocations once the iteration space is known?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking a trait something like the following:

trait RelocationSequence {
    fn num_relocations(&self) -> usize;
    fn get_crel(&self, index: usize) -> Crel
    fn crel_iter(&self) -> impl Iterator<Item = Crel>;
}

Not sure of the best name for the trait. I originally thought to call it RelocationList, but that's the name of the enum. It's likely, both are needed.

This trait could be implemented for &[Rela]. For CREL, the simplest option would be to always convert to a Vec then implement RelocationSequence for &[Crel]. This might not give ideal performance when CREL is used, but wouldn't affect performance of RELA. Alternatively, you could implement RelocationSequence for RelocationCache. In order to do lazy creation of the Vec<Crel>, either all the functions would need to take &mut self or interior mutability would be needed.

fn apply_relocations(..., relocations: &impl RelocationSequence) {...}

Calling it then might look something like this:

let relocations = object.relocations(section.index)?;
let result = match relocations {
    RelocationsList::Rela(rela) => apply_relocations(..., rela),
    RelocationList::Crel(crel) => apply_relocations(..., &RelocationCache::new(crel)),
};

It may not be necessary to make ExceptionFrame generic over the relocation type, in which case it'd store an enum. There might be a small runtime cost from using an enum here. Specifically, when CommonGroupState gets dropped, it'll need to run Drop for all ExceptionFrames in case they are the CREL variant. If that turned out to be a problem, then we could potentially make ExceptionFrame generic over the relocation type, then store two Vecs on CommonGroupState, one ExceptionFrame<&[Rela]> and one ExceptionFrame<Vec<Crel>>. That would add a little bit of complexity, since we'd need a way for FrameIndex to know which Vec to look in. So it's probably worthwhile seeing if that's actually necessary before doing it.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That all sounds good! The only place where I face a challenge is where we assign to the ExceptionFrame: relocations: relocations[rel_start_index..rel_end_index].to_vec(),. That's a place where a peekable iterator is used and unfortunately one cannot get back to the underlying iterator. I am curious how will the RelocationSequence be used as we need to do skip + take iterator operations?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it possible to add Index trait implementation there?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One option might be to add a slice method to RelocationSequence:

trait RelocationSequence {
    ...
    fn slice(&self, range: Range<usize>) -> Self;
}

For the &[Rela] implementation, this returns a subslice of the original slice. If you've implemented RelocationSequence for Vec<Crel>, then the implementation it would return a new Vec<Crel>. If you've implemented it for RelocationCache, then it'd need to return a new RelocationCache with the relevant subsequence copied.

That should hopefully work if ExceptionFrame is generic over the relocation type. If it's not, then instead of returning Self, slice would return an enum that's something like the following:

enum RelocationSlice<'data> {
    Rela(&'data [Rela]),
    Crel(Vec<Crel>),
}

@marxin marxin force-pushed the crel-integration branch from 13d9bdc to 6e648bc Compare July 4, 2025 17:57
@marxin marxin force-pushed the crel-integration branch from 6e648bc to 2960dfe Compare July 4, 2025 18:00
@davidlattimore
Copy link
Copy Markdown
Member

With the following changes, I get no increase in instruction count - possibly actually a small decrease, which I'd guess is maybe due to some different inlining decisions by the compiler. I haven't attempted to optimise processing of CREL relocations, just made sure that RELA is handled much the same as before. Everything that needs to process relocations, gets the RelocationList, matches on it, then calls a function that is generic over the kind of relocations being processed. I got rid of CrelRelocationCache in order to simplify things. It's possible that CREL processing might be more efficient if it were reintroduced, but that falls under optimising of CREL processing, which is probably something we can leave until later. I'm unsure about some of the naming, so don't feel you have to keep the names that I've used for things.

diff --git a/libwild/src/dwarf_address_info.rs b/libwild/src/dwarf_address_info.rs
index 2f9397dd80..6246b694f0 100644
--- a/libwild/src/dwarf_address_info.rs
+++ b/libwild/src/dwarf_address_info.rs
@@ -3,10 +3,12 @@
 
 use crate::arch::Arch;
 use crate::elf::File;
+use crate::elf::RelocationSequence;
 use crate::error::Result;
 use anyhow::Context;
 use object::LittleEndian;
 use object::SymbolIndex;
+use object::read::elf::Crel;
 use object::read::elf::RelocationSections;
 use std::borrow::Cow;
 use std::ffi::OsStr;
@@ -106,41 +108,19 @@
             let mut section_data = object.section_data_cow(section)?.into_owned();
 
             // Apply relocations.
-            let relocations = object.relocations(index, relocations)?;
-
-            for rel in relocations {
-                let sym_index = rel.r_sym;
-                let symbol = object.symbol(SymbolIndex(sym_index as usize))?;
-
-                let mut value = symbol
-                    .st_value
-                    .get(LittleEndian)
-                    .wrapping_add(rel.r_addend as u64);
-
-                let symbol_section = object.section(object::SectionIndex(
-                    symbol.st_shndx.get(LittleEndian) as usize,
-                ))?;
-
-                let data_offset = rel.r_offset as usize;
-
-                if symbol_section.sh_offset.get(LittleEndian)
-                    == section_of_interest.sh_offset.get(LittleEndian)
-                {
-                    value += SECTION_LOAD_ADDRESS;
-                }
-
-                let r_type = A::relocation_from_raw(rel.r_type)?;
-
-                let linker_utils::elf::RelocationSize::ByteSize(num_bytes) = r_type.size else {
-                    continue;
-                };
-
-                if r_type.kind == linker_utils::elf::RelocationKind::Absolute {
-                    section_data
-                        .get_mut(data_offset..data_offset + num_bytes)
-                        .context("Invalid relocation offset")?
-                        .copy_from_slice(&value.to_le_bytes()[..num_bytes]);
-                }
+            match object.relocations(index, relocations)? {
+                crate::elf::RelocationList::Rela(relocations) => apply_section_relocations::<A>(
+                    object,
+                    section_of_interest,
+                    &mut section_data,
+                    relocations.crel_iter(),
+                )?,
+                crate::elf::RelocationList::Crel(relocations) => apply_section_relocations::<A>(
+                    object,
+                    section_of_interest,
+                    &mut section_data,
+                    relocations.flat_map(|r| r.ok()),
+                )?,
             }
 
             Cow::Owned(section_data)
@@ -151,6 +131,49 @@
     Ok(data)
 }
 
+fn apply_section_relocations<A: Arch>(
+    object: &File<'_>,
+    section_of_interest: &object::elf::SectionHeader64<LittleEndian>,
+    section_data: &mut [u8],
+    relocations: impl Iterator<Item = Crel>,
+) -> Result {
+    for rel in relocations {
+        let sym_index = rel.r_sym;
+        let symbol = object.symbol(SymbolIndex(sym_index as usize))?;
+
+        let mut value = symbol
+            .st_value
+            .get(LittleEndian)
+            .wrapping_add(rel.r_addend as u64);
+
+        let symbol_section = object.section(object::SectionIndex(
+            symbol.st_shndx.get(LittleEndian) as usize,
+        ))?;
+
+        let data_offset = rel.r_offset as usize;
+
+        if symbol_section.sh_offset.get(LittleEndian)
+            == section_of_interest.sh_offset.get(LittleEndian)
+        {
+            value += SECTION_LOAD_ADDRESS;
+        }
+
+        let r_type = A::relocation_from_raw(rel.r_type)?;
+
+        let linker_utils::elf::RelocationSize::ByteSize(num_bytes) = r_type.size else {
+            continue;
+        };
+
+        if r_type.kind == linker_utils::elf::RelocationKind::Absolute {
+            section_data
+                .get_mut(data_offset..data_offset + num_bytes)
+                .context("Invalid relocation offset")?
+                .copy_from_slice(&value.to_le_bytes()[..num_bytes]);
+        }
+    }
+    Ok(())
+}
+
 impl Display for SourceInfo {
     fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
         if let Some(details) = self.0.as_ref() {
diff --git a/libwild/src/elf.rs b/libwild/src/elf.rs
index 2370b41d60..4cef5fd1d8 100644
--- a/libwild/src/elf.rs
+++ b/libwild/src/elf.rs
@@ -25,6 +25,7 @@
 use std::borrow::Cow;
 use std::io::Read as _;
 use std::mem::offset_of;
+use std::ops::Range;
 use std::sync::atomic::Ordering;
 
 /// Our starting address in memory when linking non-relocatable executables. We can start memory
@@ -72,51 +73,27 @@
     pub(crate) eflags: u32,
 }
 
+/// A list of relocations that supports iteration.
 #[derive(Clone)]
 pub(crate) enum RelocationList<'data> {
     Rela(&'data [Rela]),
     Crel(CrelIterator<'data>),
 }
 
-pub(crate) struct RelocationListIter<'data> {
-    list: RelocationList<'data>,
-    index: usize,
-}
-
-impl<'data> Iterator for RelocationListIter<'data> {
-    type Item = Crel;
-
-    fn next(&mut self) -> Option<Self::Item> {
-        let item = match &mut self.list {
-            RelocationList::Crel(crel) => crel.next().and_then(|r| r.ok()),
-            RelocationList::Rela(rela) => rela
-                .get(self.index)
-                .map(|r| Crel::from_rela(r, LittleEndian, false)),
-        };
-        self.index += 1;
-        item
-    }
-}
-
-impl<'data> IntoIterator for RelocationList<'data> {
-    type Item = Crel;
-    type IntoIter = RelocationListIter<'data>;
-
-    fn into_iter(self) -> Self::IntoIter {
-        Self::IntoIter {
-            list: self,
-            index: 0,
-        }
-    }
-}
-
-pub(crate) trait RelocationSequence {
+/// A sequence of relocations that supports random access.
+pub(crate) enum DynamicRelocationSequence<'data> {
+    Rela(&'data [Rela]),
+    Crel(Vec<Crel>),
+}
+
+pub(crate) trait RelocationSequence<'data> {
     fn num_relocations(&self) -> usize;
     fn get_crel(&self, index: usize) -> Crel;
     fn crel_iter(&self) -> impl Iterator<Item = Crel>;
+    fn subsequence(&self, range: Range<usize>) -> DynamicRelocationSequence<'data>;
 }
 
-impl RelocationSequence for &[Rela] {
+impl<'data> RelocationSequence<'data> for &'data [Rela] {
     fn num_relocations(&self) -> usize {
         self.len()
     }
@@ -126,29 +103,29 @@
     }
 
     fn crel_iter(&self) -> impl Iterator<Item = Crel> {
-        RelocationList::Rela(self).into_iter()
+        self.iter().map(|r| Crel::from_rela(r, LittleEndian, false))
     }
-}
-
-pub(crate) struct CrelRelocationCache(Vec<Crel>);
-
-impl CrelRelocationCache {
-    pub(crate) fn new(relocations: Vec<Crel>) -> Self {
-        Self(relocations)
+
+    fn subsequence(&self, range: Range<usize>) -> DynamicRelocationSequence<'data> {
+        DynamicRelocationSequence::Rela(&self[range])
     }
 }
 
-impl RelocationSequence for CrelRelocationCache {
+impl RelocationSequence<'static> for Vec<Crel> {
     fn num_relocations(&self) -> usize {
-        self.0.len()
+        self.len()
     }
 
     fn get_crel(&self, index: usize) -> Crel {
-        self.0[index]
+        self[index]
     }
 
     fn crel_iter(&self) -> impl Iterator<Item = Crel> {
-        self.0.clone().into_iter()
+        self.clone().into_iter()
+    }
+
+    fn subsequence(&self, range: Range<usize>) -> DynamicRelocationSequence<'static> {
+        DynamicRelocationSequence::Crel(self[range].to_vec())
     }
 }
 
@@ -596,3 +573,9 @@
 pub(crate) fn is_hidden_symbol(symbol: &crate::elf::Symbol) -> bool {
     symbol.st_visibility() == object::elf::STV_HIDDEN
 }
+
+impl Default for DynamicRelocationSequence<'_> {
+    fn default() -> Self {
+        Self::Rela(&[])
+    }
+}
diff --git a/libwild/src/elf_writer.rs b/libwild/src/elf_writer.rs
index 21bc983559..7c01c3b355 100644
--- a/libwild/src/elf_writer.rs
+++ b/libwild/src/elf_writer.rs
@@ -11,7 +11,6 @@
 use crate::bail;
 use crate::debug_assert_bail;
 use crate::elf;
-use crate::elf::CrelRelocationCache;
 use crate::elf::DynamicEntry;
 use crate::elf::EhFrameHdr;
 use crate::elf::EhFrameHdrEntry;
@@ -1193,7 +1192,7 @@
             object,
             out,
             section,
-            &CrelRelocationCache::new(crel_iter.into_iter().collect::<Result<Vec<_>, _>>()?),
+            &crel_iter.into_iter().collect::<Result<Vec<_>, _>>()?,
             layout,
             table_writer,
             trace,
@@ -1228,7 +1227,7 @@
             object,
             out,
             section,
-            &CrelRelocationCache::new(crel_iter.into_iter().collect::<Result<Vec<_>, _>>()?),
+            &crel_iter.into_iter().collect::<Result<Vec<_>, _>>()?,
             layout,
         ),
     };
@@ -1349,11 +1348,11 @@
     Ok(())
 }
 
-fn apply_relocations<A: Arch>(
+fn apply_relocations<'data, A: Arch>(
     object: &ObjectLayout,
     out: &mut [u8],
     section: &Section,
-    relocation_sequence: &impl RelocationSequence,
+    relocation_sequence: &impl RelocationSequence<'data>,
     layout: &Layout,
     table_writer: &mut TableWriter,
     trace: &TraceOutput,
@@ -1402,11 +1401,11 @@
     Ok(())
 }
 
-fn apply_debug_relocations<A: Arch>(
+fn apply_debug_relocations<'data, A: Arch>(
     object: &ObjectLayout,
     out: &mut [u8],
     section: &Section,
-    relocation_sequence: &impl RelocationSequence,
+    relocation_sequence: &impl RelocationSequence<'data>,
     layout: &Layout,
 ) -> Result {
     let object_section = object.object.section(section.index)?;
@@ -1460,12 +1459,39 @@
     trace: &TraceOutput,
 ) -> Result {
     let eh_frame_section = object.object.section(eh_frame_section_index)?;
+    match object.relocations(eh_frame_section_index)? {
+        elf::RelocationList::Rela(relocations) => write_eh_frame_relocations::<A>(
+            object,
+            layout,
+            table_writer,
+            trace,
+            eh_frame_section,
+            relocations.crel_iter(),
+        ),
+        elf::RelocationList::Crel(relocations) => write_eh_frame_relocations::<A>(
+            object,
+            layout,
+            table_writer,
+            trace,
+            eh_frame_section,
+            relocations.flat_map(|r| r.ok()),
+        ),
+    }
+}
+
+fn write_eh_frame_relocations<A: Arch>(
+    object: &ObjectLayout<'_>,
+    layout: &Layout<'_>,
+    table_writer: &mut TableWriter<'_, '_>,
+    trace: &TraceOutput,
+    eh_frame_section: &object::elf::SectionHeader64<LittleEndian>,
+    relocations: impl Iterator<Item = Crel>,
+) -> std::result::Result<(), error::Error> {
     let data = object.object.raw_section_data(eh_frame_section)?;
     const PREFIX_LEN: usize = size_of::<elf::EhFrameEntryPrefix>();
     let e = LittleEndian;
     let section_flags = SectionFlags::from_header(eh_frame_section);
-    let relocations = object.relocations(eh_frame_section_index)?;
-    let mut relocations = relocations.into_iter().peekable();
+    let mut relocations = relocations.peekable();
     let mut input_pos = 0;
     let mut output_pos = 0;
     let frame_info_ptr_base = table_writer.eh_frame_start_address;
@@ -1822,7 +1848,7 @@
 /// Handling For Thread-Local Storage" for details about some of the TLS-related relocations and
 /// transformations that are applied.
 #[inline(always)]
-fn apply_relocation<A: Arch>(
+fn apply_relocation<'data, A: Arch>(
     object_layout: &ObjectLayout,
     mut offset_in_section: u64,
     rel: &Crel,
@@ -1831,7 +1857,7 @@
     out: &mut [u8],
     table_writer: &mut TableWriter,
     trace: &TraceOutput,
-    relocation_sequence: &impl RelocationSequence,
+    relocation_sequence: &impl RelocationSequence<'data>,
     relocation_index: usize,
 ) -> Result<RelocationModifier> {
     let section_address = section_info.section_address;
@@ -2204,14 +2230,14 @@
     Ok(next_modifier)
 }
 
-fn apply_debug_relocation<A: Arch>(
+fn apply_debug_relocation<'data, A: Arch>(
     object_layout: &ObjectLayout,
     offset_in_section: u64,
     rel: &Crel,
     layout: &Layout,
     section_tombstone_value: u64,
     out: &mut [u8],
-    relocation_sequence: &impl RelocationSequence,
+    relocation_sequence: &impl RelocationSequence<'data>,
     relocation_index: usize,
 ) -> Result<()> {
     let symbol_index = rel.symbol().context("Unsupported absolute relocation")?;
diff --git a/libwild/src/layout.rs b/libwild/src/layout.rs
index 9cae4235a3..0544e55f5e 100644
--- a/libwild/src/layout.rs
+++ b/libwild/src/layout.rs
@@ -18,10 +18,12 @@
 use crate::debug_assert_bail;
 use crate::diagnostics::SymbolInfoPrinter;
 use crate::elf;
+use crate::elf::DynamicRelocationSequence;
 use crate::elf::EhFrameHdrEntry;
 use crate::elf::File;
 use crate::elf::FileHeader;
 use crate::elf::RelocationList;
+use crate::elf::RelocationSequence;
 use crate::elf::Versym;
 use crate::elf_writer;
 use crate::ensure;
@@ -1220,9 +1222,6 @@
     /// is stored is non-deterministic and is whichever object first requested export of that
     /// symbol. That's OK though because the epilogue will sort all dynamic symbols.
     dynamic_symbol_definitions: Vec<DynamicSymbolDefinition<'data>>,
-
-    /// Indexed by `FrameIndex`.
-    exception_frames: Vec<ExceptionFrame>,
 }
 
 impl CommonGroupState<'_> {
@@ -1231,7 +1230,6 @@
             mem_sizes: output_sections.new_part_map(),
             section_attributes: output_sections.new_section_map(),
             dynamic_symbol_definitions: Default::default(),
-            exception_frames: Default::default(),
         }
     }
 
@@ -1336,12 +1334,15 @@
 
     gnu_property_notes: Vec<GnuProperty>,
     riscv_attributes: Vec<RiscVAttribute>,
+
+    /// Indexed by `FrameIndex`.
+    exception_frames: Vec<ExceptionFrame<'data>>,
 }
 
 #[derive(Default)]
-struct ExceptionFrame {
+struct ExceptionFrame<'data> {
     /// The relocations that need to be processed if we load this frame.
-    relocations: Vec<Crel>,
+    relocations: DynamicRelocationSequence<'data>,
 
     /// Number of bytes required to store this frame.
     frame_size: u32,
@@ -2948,7 +2949,7 @@
 
 #[inline(always)]
 fn process_relocation<A: Arch>(
-    object: &mut ObjectLayoutState,
+    object: &ObjectLayoutState,
     common: &mut CommonGroupState,
     rel: &Crel,
     section: &object::elf::SectionHeader64<LittleEndian>,
@@ -3973,6 +3974,7 @@
             cies: Default::default(),
             gnu_property_notes: Default::default(),
             riscv_attributes: Default::default(),
+            exception_frames: Default::default(),
         })
     } else {
         FileLayoutState::Dynamic(DynamicLayoutState {
@@ -4136,28 +4138,26 @@
         let part_id = unloaded.part_id;
         let header = self.object.section(section_index)?;
         let section = Section::create(header, self, section_index, part_id)?;
-        let mut modifier = RelocationModifier::Normal;
 
-        for rel in self.relocations(section.index)?.into_iter() {
-            if modifier == RelocationModifier::SkipNextRelocation {
-                modifier = RelocationModifier::Normal;
-                continue;
-            }
-            modifier = process_relocation::<A>(
-                self,
-                common,
-                &rel,
-                self.object.section(section.index)?,
-                resources,
-                queue,
-                false,
-            )
-            .with_context(|| {
-                format!(
-                    "Failed to copy section {} from file {self}",
-                    section_debug(self.object, section.index)
-                )
-            })?;
+        match self.relocations(section.index)? {
+            RelocationList::Rela(relocations) => {
+                self.load_section_relocations::<A>(
+                    common,
+                    queue,
+                    resources,
+                    section,
+                    relocations.crel_iter(),
+                )?;
+            }
+            RelocationList::Crel(relocations) => {
+                self.load_section_relocations::<A>(
+                    common,
+                    queue,
+                    resources,
+                    section,
+                    relocations.flat_map(|r| r.ok()),
+                )?;
+            }
         }
 
         tracing::debug!(loaded_section = %self.object.section_display_name(section_index),);
@@ -4183,6 +4183,40 @@
         Ok(())
     }
 
+    fn load_section_relocations<A: Arch>(
+        &self,
+        common: &mut CommonGroupState<'data>,
+        queue: &mut LocalWorkQueue,
+        resources: &GraphResources<'data, '_>,
+        section: Section,
+        relocations: impl Iterator<Item = Crel>,
+    ) -> Result {
+        let mut modifier = RelocationModifier::Normal;
+        for rel in relocations {
+            if modifier == RelocationModifier::SkipNextRelocation {
+                modifier = RelocationModifier::Normal;
+                continue;
+            }
+            modifier = process_relocation::<A>(
+                self,
+                common,
+                &rel,
+                self.object.section(section.index)?,
+                resources,
+                queue,
+                false,
+            )
+            .with_context(|| {
+                format!(
+                    "Failed to copy section {} from file {self}",
+                    section_debug(self.object, section.index)
+                )
+            })?;
+        }
+
+        Ok(())
+    }
+
     /// Processes the exception frames for a section that we're loading.
     fn process_section_exception_frames<A: Arch>(
         &mut self,
@@ -4194,28 +4228,43 @@
         let mut num_frames = 0;
         let mut next_frame_index = frame_index;
         while let Some(frame_index) = next_frame_index {
-            let frame_data = &common.exception_frames[frame_index.as_usize()];
+            let frame_data = &self.exception_frames[frame_index.as_usize()];
             next_frame_index = frame_data.previous_frame_for_section;
 
             self.eh_frame_size += u64::from(frame_data.frame_size);
 
             num_frames += 1;
 
-            let frame_data_relocations = frame_data.relocations.iter().copied().collect_vec();
-
             // Request loading of any sections/symbols referenced by the FDEs for our
             // section.
             if let Some(eh_frame_section) = self.eh_frame_section {
-                for rel in frame_data_relocations {
-                    process_relocation::<A>(
-                        self,
-                        common,
-                        &rel,
-                        eh_frame_section,
-                        resources,
-                        queue,
-                        false,
-                    )?;
+                match &frame_data.relocations {
+                    DynamicRelocationSequence::Rela(frame_data_relocations) => {
+                        for rel in *frame_data_relocations {
+                            process_relocation::<A>(
+                                self,
+                                common,
+                                &Crel::from_rela(rel, LittleEndian, false),
+                                eh_frame_section,
+                                resources,
+                                queue,
+                                false,
+                            )?;
+                        }
+                    }
+                    DynamicRelocationSequence::Crel(frame_data_relocations) => {
+                        for rel in frame_data_relocations.crel_iter() {
+                            process_relocation::<A>(
+                                self,
+                                common,
+                                &rel,
+                                eh_frame_section,
+                                resources,
+                                queue,
+                                false,
+                            )?;
+                        }
+                    }
                 }
             }
         }
@@ -4241,28 +4290,22 @@
     ) -> Result {
         let header = self.object.section(section_index)?;
         let section = Section::create(header, self, section_index, part_id)?;
-        let relocations = self.relocations(section.index)?.into_iter();
         if A::local_symbols_in_debug_info() {
-            for rel in relocations {
-                let modifier = process_relocation::<A>(
-                    self,
-                    common,
-                    &rel,
-                    self.object.section(section.index)?,
-                    resources,
-                    queue,
-                    true,
-                )
-                .with_context(|| {
-                    format!(
-                        "Failed to copy section {} from file {self}",
-                        section_debug(self.object, section.index)
-                    )
-                })?;
-                ensure!(
-                    modifier == RelocationModifier::Normal,
-                    "All debug relocations must be processed"
-                );
+            match self.relocations(section.index)? {
+                RelocationList::Rela(relocations) => self.load_debug_relocations::<A>(
+                    common,
+                    queue,
+                    resources,
+                    section,
+                    relocations.crel_iter(),
+                )?,
+                RelocationList::Crel(relocations) => self.load_debug_relocations::<A>(
+                    common,
+                    queue,
+                    resources,
+                    section,
+                    relocations.flat_map(|r| r.ok()),
+                )?,
             }
         }
 
@@ -4273,6 +4316,39 @@
         Ok(())
     }
 
+    fn load_debug_relocations<A: Arch>(
+        &self,
+        common: &mut CommonGroupState<'data>,
+        queue: &mut LocalWorkQueue,
+        resources: &GraphResources<'data, '_>,
+        section: Section,
+        relocations: impl Iterator<Item = Crel>,
+    ) -> Result<(), Error> {
+        for rel in relocations {
+            let modifier = process_relocation::<A>(
+                self,
+                common,
+                &rel,
+                self.object.section(section.index)?,
+                resources,
+                queue,
+                true,
+            )
+            .with_context(|| {
+                format!(
+                    "Failed to copy section {} from file {self}",
+                    section_debug(self.object, section.index)
+                )
+            })?;
+            ensure!(
+                modifier == RelocationModifier::Normal,
+                "All debug relocations must be processed"
+            );
+        }
+
+        Ok(())
+    }
+
     fn finalise_sizes(
         &mut self,
         common: &mut CommonGroupState,
@@ -4688,12 +4764,43 @@
 ) -> Result {
     let eh_frame_section = object.object.section(eh_frame_section_index)?;
     let data = object.object.raw_section_data(eh_frame_section)?;
+    match object.relocations(eh_frame_section_index)? {
+        RelocationList::Rela(relocations) => process_eh_frame_relocations::<A>(
+            object,
+            common,
+            file_symbol_id_range,
+            resources,
+            queue,
+            eh_frame_section,
+            data,
+            &relocations,
+        ),
+        RelocationList::Crel(crel_iterator) => process_eh_frame_relocations::<A>(
+            object,
+            common,
+            file_symbol_id_range,
+            resources,
+            queue,
+            eh_frame_section,
+            data,
+            &crel_iterator.collect::<Result<Vec<Crel>, _>>()?,
+        ),
+    }
+}
+
+fn process_eh_frame_relocations<'data, 'rel: 'data, A: Arch>(
+    object: &mut ObjectLayoutState<'data>,
+    common: &mut CommonGroupState<'data>,
+    file_symbol_id_range: SymbolIdRange,
+    resources: &GraphResources<'_, '_>,
+    queue: &mut LocalWorkQueue,
+    eh_frame_section: &'data object::elf::SectionHeader64<LittleEndian>,
+    data: &'data [u8],
+    relocations: &impl RelocationSequence<'rel>,
+) -> Result {
     const PREFIX_LEN: usize = size_of::<elf::EhFrameEntryPrefix>();
-    let relocations = object
-        .relocations(eh_frame_section_index)?
-        .into_iter()
-        .collect_vec();
-    let mut rel_iter = relocations.iter().enumerate().peekable();
+
+    let mut rel_iter = relocations.crel_iter().enumerate().peekable();
     let mut offset = 0;
 
     while offset + PREFIX_LEN <= data.len() {
@@ -4779,14 +4886,14 @@
             if let Some(section_index) = section_index
                 && let Some(unloaded) = object.sections[section_index.0].unloaded_mut()
             {
-                let frame_index = FrameIndex::from_usize(common.exception_frames.len());
+                let frame_index = FrameIndex::from_usize(object.exception_frames.len());
 
                 // Update our unloaded section to point to our new frame. Our frame will then in
                 // turn point to whatever the section pointed to before.
                 let previous_frame_for_section = unloaded.last_frame_index.replace(frame_index);
 
-                common.exception_frames.push(ExceptionFrame {
-                    relocations: relocations[rel_start_index..rel_end_index].to_vec(),
+                object.exception_frames.push(ExceptionFrame {
+                    relocations: relocations.subsequence(rel_start_index..rel_end_index),
                     frame_size: size as u32,
                     previous_frame_for_section,
                 });

@marxin marxin marked this pull request as ready for review August 13, 2025 04:36
@marxin
Copy link
Copy Markdown
Collaborator Author

marxin commented Aug 13, 2025

Thank you very much @davidlattimore, awesome work and I can confirm the following numbers for linking of the rustc:

Benchmark 2 (28 runs): ./run-with ~/Programming/wild/ld --no-fork
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           180ms ± 6.00ms     170ms …  197ms          1 ( 4%)          +  0.2% ±  1.7%
  peak_rss            604MB ± 9.01MB     594MB …  641MB          1 ( 4%)          +  0.2% ±  0.6%
  cpu_cycles         5.14G  ±  133M     4.88G  … 5.42G           0 ( 0%)        💩+  4.1% ±  1.5%
  instructions       2.93G  ± 9.11M     2.91G  … 2.94G           0 ( 0%)          +  0.3% ±  0.2%
  cache_references   91.4M  ±  680K     90.0M  … 92.7M           0 ( 0%)        💩+  2.3% ±  0.4%
  cache_misses       17.1M  ±  140K     16.7M  … 17.3M           1 ( 4%)        💩+  1.8% ±  0.4%
  branch_misses      7.58M  ± 34.3K     7.52M  … 7.67M           0 ( 0%)          +  0.8% ±  0.2%

Copy link
Copy Markdown
Member

@davidlattimore davidlattimore left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your work on this!

@marxin marxin merged commit 73e4578 into wild-linker:main Aug 13, 2025
23 checks passed
@marxin marxin deleted the crel-integration branch August 13, 2025 06:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants