Skip to content

Check useUnsafe before usePrealloc in TapeMachine exec#579

Open
eric wants to merge 1 commit intogorgonia:masterfrom
fancybits:fix-unsafe-before-prealloc
Open

Check useUnsafe before usePrealloc in TapeMachine exec#579
eric wants to merge 1 commit intogorgonia:masterfrom
fancybits:fix-unsafe-before-prealloc

Conversation

@eric
Copy link
Copy Markdown

@eric eric commented Mar 15, 2026

The non-CUDA TapeMachine exec switch checked usePrealloc (a runtime check for non-nil destination register) before instr.useUnsafe (a compile-time flag for shared input/output registers). When registers are shared, the destination always has a value (the input), so usePrealloc was always true, shadowing the useUnsafe case entirely.

For ops that implement UnsafeDo but not UsePreallocDo (e.g. elemUnaryOp), this caused plain Do() to run instead of UnsafeDo(), allocating new memory rather than modifying in-place. This defeated the register sharing optimization.

The CUDA execution path (vm_tape_cuda.go) has no usePrealloc case and already checks useUnsafe directly after preAllocated. This change aligns the non-CUDA path with the CUDA path's intended design.

The non-CUDA TapeMachine exec switch checked usePrealloc (a runtime
check for non-nil destination register) before instr.useUnsafe (a
compile-time flag for shared input/output registers). When registers
are shared, the destination always has a value (the input), so
usePrealloc was always true, shadowing the useUnsafe case entirely.

For ops that implement UnsafeDo but not UsePreallocDo (e.g.
elemUnaryOp), this caused plain Do() to run instead of UnsafeDo(),
allocating new memory rather than modifying in-place. This defeated
the register sharing optimization.

The CUDA execution path (vm_tape_cuda.go) has no usePrealloc case
and already checks useUnsafe directly after preAllocated. This change
aligns the non-CUDA path with the CUDA path's intended design.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant