Thanks for your great work and for releasing the code.
I noticed in Section 5.3 of the paper that it states:
“Here we report average success rates and average performance drop over three random seeds.”
I also found in the provided evaluation code that the models are evaluated under three different seeds.
I would like to confirm the actual evaluation protocol:
- Is each method trained separately under three random seeds, and then each trained model is evaluated under the same three seeds with the results averaged?
or
- Is each method trained under only a single seed, and the resulting weights are then evaluated under three seeds for averaging?
Thanks in advance for your clarification!
Thanks for your great work and for releasing the code.
I noticed in Section 5.3 of the paper that it states:
“Here we report average success rates and average performance drop over three random seeds.”
I also found in the provided evaluation code that the models are evaluated under three different seeds.
I would like to confirm the actual evaluation protocol:
or
Thanks in advance for your clarification!