In the LoopMatmulLdB module, line 185 is written like "val rows = block_size.U - Mux(max_row_iterator === max_row_iterator-1.U, row_pad, 0.U)", which causes the B matrix to always load in block_size rows regardless of what the row_iterator is. I'm pretty sure this line should be " val rows = block_size.U - Mux(row_iterator === max_row_iterator-1.U, row_pad, 0.U)"
In the LoopMatmulLdB module, line 185 is written like "val rows = block_size.U - Mux(max_row_iterator === max_row_iterator-1.U, row_pad, 0.U)", which causes the B matrix to always load in block_size rows regardless of what the row_iterator is. I'm pretty sure this line should be " val rows = block_size.U - Mux(row_iterator === max_row_iterator-1.U, row_pad, 0.U)"