Compatibility with CUDA 12.6 Upgrade: Regarding issue #297 "CUDA Results are Inconsistent"#573
Open
guiperry wants to merge 2 commits intogorgonia:masterfrom
Open
Compatibility with CUDA 12.6 Upgrade: Regarding issue #297 "CUDA Results are Inconsistent"#573guiperry wants to merge 2 commits intogorgonia:masterfrom
guiperry wants to merge 2 commits intogorgonia:masterfrom
Conversation
Successfully upgraded Gorgonia to work with CUDA 12.6 by installing it alongside an existing CUDA 13.0 toolkit installation. After a separate CUDA 13 compatibility analysis, the decision was made to upgrade Gorgonia compatibility with CUDA 12.6 toolkit. The upgrade enables support for modern GPU architectures (Volta, Turing, Ampere) while maintaining backward compatibility with older GPUs. Please check the docs/CUDA_12_Upgrade_Summary.md file for a full summary of what was implemented in this upgrade.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Successfully upgraded Gorgonia to work with CUDA 12.6 by installing it alongside an existing CUDA 13.0 toolkit installation. After a separate CUDA 13 compatibility analysis, the decision was made to upgrade Gorgonia compatibility with CUDA 12.6 toolkit. The upgrade enables support for modern GPU architectures (Volta, Turing, Ampere) while maintaining backward compatibility with older GPUs.
Here's a summary of the findings regarding issue #297, "CUDA Results are Inconsistent," after a Gorgonia compatibility upgrade to CUDA 12.6:
"Fails to function correctly" aspect: The original Go code provided in issue CUDA Results are Inconsistent #297, which uses random initialization (GlorotU and GlorotN), was executed 10 times. All runs completed without any panics or fatal errors, producing numerical outputs. This strongly suggests that the stability issues, where the program would "occasionally just fail to function correctly," have been resolved with the CUDA 12.6 upgrade.
"Inconsistent results" aspect: This part is harder to definitively confirm without a specific, deterministic test case or a precise definition of what constitutes an "inconsistent result" in a non-deterministic scenario. However, the absence of crashes and the successful completion of all runs indicate improved reliability.
Deterministic Test: I created a new deterministic test (issue_297_deterministic.go) using fixed input values. This test successfully calculated the expected matrix multiplication result, confirming that the underlying arithmetic operations are working correctly.
Environment Setup: We successfully worked around the Go 1.23 garbage collector compatibility panic by setting the ASSUME_NO_MOVING_GC_UNSAFE_RISK_IT_WITH=go1.23 environment variable.
Automation: A shell script (run_issue_tests.sh & run_isssue_tests.bat) has been created to easily run both the non-deterministic (10 times) and deterministic tests.
Conclusion: Based on the provided reproduction code and the new deterministic test, it appears highly likely that the CUDA 12.6 upgrade has fixed the core stability and correctness issues reported in GitHub issue CUDA Results are Inconsistent #297. While the "inconsistent results" aspect cannot be 100% verified without a specific deterministic test for that particular type of inconsistency, the overall behavior is much more robust.
On my fork, you can upgrade gorgonia and your cuda version automatically by running the ./scripts/gorgonia_cuda_upgrade.sh script from the project root.
You can run also these tests anytime by executing ./scripts/run_issue_tests.sh for Linux or ./scripts/run_issue_tests.bat for Windows from the project root.
Please check the docs/CUDA_12_Upgrade_Summary.md file for a full summary of what was implemented in this upgrade.