Conversation
…on to scalar in Normal distribution
- Replace manual normal sampling and logprob calculation with utility function - Fix tensor operations using proper broadcasting and element-wise methods - Add ToElement import for tensor casting operations - Improve numerical stability in probability calculations - Generalize forward pass to work with any tensor dimension
- Remove AutodiffBackend constraint from QNet to support inference - Extract temporal difference calculation into separate function - Add data import for batch processing - Clone networks for target initialization instead of loading records - Begin implementing QNet training loop with TD targets
- Modify `train_net` to accept `Tensor<B, 2>` and `DataBatch<B>` - Update both Q-networks with temporal difference target during training - Add policy network entropy calculation to training loop - Replace transition struct fields with batch-compatible states and actions
- Add Q-network forward passes for both critics - Compute minimum Q-value between two critics - Calculate policy loss using min Q-value and entropy - Remove placeholder comment for optimizer (TODO remains)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.