Skip to content

Compatibility with CUDA 12.6 Upgrade: Regarding issue #297 "CUDA Results are Inconsistent"#573

Open
guiperry wants to merge 2 commits intogorgonia:masterfrom
guiperry:master
Open

Compatibility with CUDA 12.6 Upgrade: Regarding issue #297 "CUDA Results are Inconsistent"#573
guiperry wants to merge 2 commits intogorgonia:masterfrom
guiperry:master

Conversation

@guiperry
Copy link
Copy Markdown

Successfully upgraded Gorgonia to work with CUDA 12.6 by installing it alongside an existing CUDA 13.0 toolkit installation. After a separate CUDA 13 compatibility analysis, the decision was made to upgrade Gorgonia compatibility with CUDA 12.6 toolkit. The upgrade enables support for modern GPU architectures (Volta, Turing, Ampere) while maintaining backward compatibility with older GPUs.

Here's a summary of the findings regarding issue #297, "CUDA Results are Inconsistent," after a Gorgonia compatibility upgrade to CUDA 12.6:

  • "Fails to function correctly" aspect: The original Go code provided in issue CUDA Results are Inconsistent #297, which uses random initialization (GlorotU and GlorotN), was executed 10 times. All runs completed without any panics or fatal errors, producing numerical outputs. This strongly suggests that the stability issues, where the program would "occasionally just fail to function correctly," have been resolved with the CUDA 12.6 upgrade.

  • "Inconsistent results" aspect: This part is harder to definitively confirm without a specific, deterministic test case or a precise definition of what constitutes an "inconsistent result" in a non-deterministic scenario. However, the absence of crashes and the successful completion of all runs indicate improved reliability.

  • Deterministic Test: I created a new deterministic test (issue_297_deterministic.go) using fixed input values. This test successfully calculated the expected matrix multiplication result, confirming that the underlying arithmetic operations are working correctly.

  • Environment Setup: We successfully worked around the Go 1.23 garbage collector compatibility panic by setting the ASSUME_NO_MOVING_GC_UNSAFE_RISK_IT_WITH=go1.23 environment variable.

  • Automation: A shell script (run_issue_tests.sh & run_isssue_tests.bat) has been created to easily run both the non-deterministic (10 times) and deterministic tests.

  • Conclusion: Based on the provided reproduction code and the new deterministic test, it appears highly likely that the CUDA 12.6 upgrade has fixed the core stability and correctness issues reported in GitHub issue CUDA Results are Inconsistent #297. While the "inconsistent results" aspect cannot be 100% verified without a specific deterministic test for that particular type of inconsistency, the overall behavior is much more robust.

On my fork, you can upgrade gorgonia and your cuda version automatically by running the ./scripts/gorgonia_cuda_upgrade.sh script from the project root.

You can run also these tests anytime by executing ./scripts/run_issue_tests.sh for Linux or ./scripts/run_issue_tests.bat for Windows from the project root.

Please check the docs/CUDA_12_Upgrade_Summary.md file for a full summary of what was implemented in this upgrade.

Successfully upgraded Gorgonia to work with CUDA 12.6 by installing it alongside an existing CUDA 13.0 toolkit installation. After a separate CUDA 13 compatibility analysis, the decision was made to upgrade Gorgonia compatibility with CUDA 12.6 toolkit. The upgrade enables support for modern GPU architectures (Volta, Turing, Ampere) while maintaining backward compatibility with older GPUs.

Please check the docs/CUDA_12_Upgrade_Summary.md file for a full summary of what was implemented in this upgrade.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant