Background Work
Setup
- Chipyard Version: 1.10.0
- Chipyard Commit Hash: 00853c
- Gemmini Commit Hash: f13847e
- OS: Ubuntu 22.04.5 LTS (Linux 6.8.0-79-generic, x86_64)
- Toolchain: Default Chipyard setup as per documentation
Issue Description
Running standard Gemmini baremetal tests works (e.g., tiled_matmul_ws-baremetal), but workloads involving more complex operations (e.g., softmax, layernorm) consistently fail in simulation with an assertion in ReservationStation.scala.
This assertion indicates an invalid entry is being accessed in the reservation station:
assert(entries_st(issue_id).valid)
It seems the reservation station is attempting to issue an entry that is not valid, possibly due to scheduling/queueing logic when handling these micro-ops.
Steps to Reproduce
Successful Run
make CONFIG=GemminiRocketConfig run-binary \
BINARY=../../generators/gemmini/software/gemmini-rocc-tests/build/bareMetalC/tiled_matmul_ws-baremetal
Failing Run
make CONFIG=GemminiRocketConfig run-binary \
BINARY=../../generators/gemmini/software/gemmini-rocc-tests/build/bareMetalC/tiled_matmul_ws_softmax-baremetal
Error Log (Failing Case)
/home/mingzhenjia/Desktop/chipyard/sims/vcs/generated-src/chipyard.harness.TestHarness.GemminiRocketConfig/gen-collateral/ReservationStation.sv", 9827:
TestDriver.testHarness.chiptop0.system.tile_prci_domain.tile_reset_domain_tile.gemmini.reservation_station: at time 2877727000 ps
Assertion failed at ReservationStation.scala:479
assert(entries_st(issue_id).valid)
Fatal: .../ReservationStation.sv", 9829:
$finish called at time 2877727000 ps
Log (Successful Case)
Starting gemmini matmul
Cycles taken: 2392
Starting slow CPU matmul
Cycles taken: 3227174
Fatal: ".../TestDriver.v", 147:
$finish called at time 10000000500 ps
Expected Behavior
The simulation should complete normally (print cycles), as with the tiled_matmul_ws-baremetal test.
Request for Help
- What conditions might cause
ReservationStation.scala to issue an invalid entry for workloads like softmax/layernorm?
- Any suggestions for signals to trace or configuration/debugging strategies to narrow down the issue?
Background Work
Setup
Issue Description
Running standard Gemmini baremetal tests works (e.g.,
tiled_matmul_ws-baremetal), but workloads involving more complex operations (e.g., softmax, layernorm) consistently fail in simulation with an assertion inReservationStation.scala.This assertion indicates an invalid entry is being accessed in the reservation station:
assert(entries_st(issue_id).valid)
It seems the reservation station is attempting to issue an entry that is not valid, possibly due to scheduling/queueing logic when handling these micro-ops.
Steps to Reproduce
Successful Run
Failing Run
Error Log (Failing Case)
/home/mingzhenjia/Desktop/chipyard/sims/vcs/generated-src/chipyard.harness.TestHarness.GemminiRocketConfig/gen-collateral/ReservationStation.sv", 9827:
TestDriver.testHarness.chiptop0.system.tile_prci_domain.tile_reset_domain_tile.gemmini.reservation_station: at time 2877727000 ps
Assertion failed at ReservationStation.scala:479
assert(entries_st(issue_id).valid)
Fatal: .../ReservationStation.sv", 9829:
$finish called at time 2877727000 ps
Log (Successful Case)
Starting gemmini matmul
Cycles taken: 2392
Starting slow CPU matmul
Cycles taken: 3227174
Fatal: ".../TestDriver.v", 147:
$finish called at time 10000000500 ps
Expected Behavior
The simulation should complete normally (print cycles), as with the
tiled_matmul_ws-baremetaltest.Request for Help
ReservationStation.scalato issue an invalid entry for workloads like softmax/layernorm?