Problem
Brain declares work complete without actually verifying the artifact works. Concrete example from user feedback:
if i ask it to make a workflow, it should test it etc before claiming its done.
She asked for a workflow. Loom produced a .ga file locally and said done -- without uploading it to Galaxy, invoking it, or checking that it ran. The user had to notice and manually upload / test.
Expected behavior
For any task with a checkable result, the brain should close the loop before claiming completion. Specifically:
- Workflow authored → upload to Galaxy, invoke on a test input, verify success
- Tool run → check the output dataset exists and looks reasonable
- File / config created → confirm state matches what the user asked for
- Plan executed → verify phase outputs, don't just assert them
The brain already has the tools (galaxy-mcp for upload/invoke/inspect, file tools for local artifacts). This is a behavior fix, not a tooling one -- the discipline is "evidence before assertion, always."
Scope
Applies broadly -- not just workflows. Any "done" claim should be backed by a verification step the brain actually executed.
Motivation
Reproducible agentic science depends on the agent closing its own loop. Handing the user a file and saying done is exactly the pattern we want to eliminate -- it's the gap between "agent made a thing" and "agent produced a verified result in Galaxy."
Problem
Brain declares work complete without actually verifying the artifact works. Concrete example from user feedback:
She asked for a workflow. Loom produced a
.gafile locally and said done -- without uploading it to Galaxy, invoking it, or checking that it ran. The user had to notice and manually upload / test.Expected behavior
For any task with a checkable result, the brain should close the loop before claiming completion. Specifically:
The brain already has the tools (galaxy-mcp for upload/invoke/inspect, file tools for local artifacts). This is a behavior fix, not a tooling one -- the discipline is "evidence before assertion, always."
Scope
Applies broadly -- not just workflows. Any "done" claim should be backed by a verification step the brain actually executed.
Motivation
Reproducible agentic science depends on the agent closing its own loop. Handing the user a file and saying done is exactly the pattern we want to eliminate -- it's the gap between "agent made a thing" and "agent produced a verified result in Galaxy."