Autoresearch Testnet E2E¶
One end-to-end guide for testing the shared autoresearch testnet from either:
- a direct independent agent path, or
- the GUI with a GUI-managed external agent.
This guide also covers the missing piece that older docs glossed over: how submissions become accepted. It also documents the recipient coldkey split so miners can tell local wallet state apart from published claim state.
Scope¶
The shared testnet flow spans three systems:
- coordinator: tasks, claims, submissions, verification state, best results
- SN94 GUI or agent miner: claim, run, submit, and declare recipient coldkey
- Pool claim service: recipient mapping, epoch publication, and Merkle claim packages
Important:
autoresearch-bittensoritself does not do Merkle claims or chain publishing.- For
standardandcenterlesstasks, submission creation is not enough. A validator must verify the submission. - For
peer_evaluationtasks, miners judge miners through peer consensus instead of/verify. - The miner hotkey is how claim packages are discovered. The published
recipient_coldkeyis where the Pool claim path pays. - A locally declared recipient coldkey is not the same thing as a published claim recipient unless Pool has stored and republished that mapping into the epoch.
Live Endpoints¶
Current shared testnet endpoints:
{
"pool_endpoint": "https://3fhi3ukpyw.eu-central-1.awsapprunner.com",
"research_coordinator_endpoint": "https://chvp2wytst.eu-central-1.awsapprunner.com",
"merkle_claim_endpoint": "https://3fhi3ukpyw.eu-central-1.awsapprunner.com/claims",
"onchain_ws_url": "wss://test.finney.opentensor.ai:443",
"onchain_contract": "5G1fuA6RPVCUu7K5ep7SWJLzQaqzdJAwchQHppkfVKzEVv49",
"onchain_metadata_path": "/home/mekaneeky/repos/Pool/new_merkle/app/assets/merklepool.json"
}
Operational notes:
- Use task
slug, not a hardcoded task ID. - Claim publication is windowed, so Merkle packages appear after the next pool rollover, not immediately.
- Reward success is not measured by free balance on the miner hotkey.
- For testnet claims, think in this order: GUI declaration -> Pool storage -> consensus publication -> claim package ->
claim_single. - Only the latest published epoch is claimable; the published
amount_unitsare cumulative for that hotkey/recipient pair.
Current Coordinator Catalog¶
On the current autoresearch-bittensor:testing branch, the built-in seeded task catalog is:
qwen3-06b-binary-frontier->standardqwen3-06b-ternary-frontier->centerlessqwen3-06b-binary-kernel->standardqwen3-06b-ternary-kernel->centerless
All 4 use heldout_quality as the primary metric, Pareto ranking, and a secondary metric of either compression_ratio or speedup.
Validation Modes¶
There are two validation paths in the current implementation.
standard and centerless¶
These require validator-backed acceptance.
The path is:
- miner claims task or work item
- miner submits patch plus
submission.json - submission is stored as
pending_verification - validator replay records
accepted,rejected, orerrorthrough a signed validator job result - accepted submission updates the task best result
Implemented validation paths:
- public SN94 validator runner: signed
POST /api/v1/validator/submissions/scan, local replay for every returnedreplay_spec, then signedPOST /api/v1/validator/jobs/{job_id}/result - legacy public signed
POST /api/v1/submissions/{id}/verifyfrom an allowlisted validator hotkey, using SN94-BitSota signing helpers - backend-owned background validator worker via
autoresearch-validate
peer_evaluation¶
These do not use /verify.
The path is:
- miner submits work
- other miners evaluate the pending submission
- consensus is tracked through
POST /api/v1/submissions/{id}/peer-evaluate - once the threshold is met, peer consensus marks the accepted result
Relevant CLI support already exists:
bitsota-research-agent peer-evaluate-oncebitsota-research-agent loop --allow-peer-evaluation
Prereqs¶
- a funded test wallet or generated test mnemonic
- coordinator, claim service, and websocket reachable
- Python environment with
bitsota-research-agentinstalled from this repo - GUI environment if testing the GUI path
codex,claude, orhermesinstalled if using an external agent- Merkle metadata file available at the configured path
If the console entrypoint is not on PATH, the fallback is:
Codex-Only Testnet Path¶
If you want Codex to work directly against a task repo, use the Codex-only public guide instead of the retired wrapper path.
Use the direct testnet prompt in autoresearch-testnet-direct-prompt.md or the production prompt in autoresearch-agent-master-prompt.md. Point Codex at the SN94 checkout only:
/home/mekaneeky/repos/SN94-BitSota
The public helpers available for manual signed calls are:
/home/mekaneeky/repos/SN94-BitSota/scripts/research_signed_request.py/home/mekaneeky/repos/SN94-BitSota/scripts/claim_merkle_rewards.pybitsota-research-agent submit-workspace
Direct Codex launch shape:
cat >/tmp/direct-autoresearch-prompt.txt <<'EOF'
<paste the contents of docs/guides/autoresearch-agent-master-prompt.md here>
EOF
codex exec --dangerously-bypass-approvals-and-sandbox \
-C /home/mekaneeky/repos \
--add-dir /home/mekaneeky/repos/SN94-BitSota \
- < /tmp/direct-autoresearch-prompt.txt
Historical note:
- A direct Distil run reached
pending_verificationon2026-04-08, but that old Distil slug is no longer the default seeded catalog ontesting. - For the current codebase, validate against one of the live Qwen task slugs returned by the coordinator.
What this path does not replace:
- validator replay for
standardandcenterless - peer consensus for
peer_evaluation - Merkle publication and claim timing
Retired Wrapper Path¶
The local wrapper launcher is no longer part of public miner onboarding. Public docs should direct miners to Mining Without an Agent or Codex-Only Mining.
GUI E2E¶
Set GUI config for shared testnet plus the selected manual or Codex workflow.
If you are using the current guided setup flow, configure the same values through Research Setup and wallet setup instead of hand-editing JSON.
The recipient coldkey expectations are:
- wallet setup stores the miner's declared recipient coldkey locally
- Pool, not the coordinator, owns the published claim recipient mapping
- the claim table
Recipientvalue comes from the published claim package for that epoch - if the local declaration and the claim row differ, do not assume the GUI is wrong; check Pool publication
- the GUI should make the local split visible by showing both the connected hotkey and the declared recipient coldkey near the claims view
Example JSON for a manual source/dev run:
{
"pool_endpoint": "https://3fhi3ukpyw.eu-central-1.awsapprunner.com",
"merkle_claim_endpoint": "https://3fhi3ukpyw.eu-central-1.awsapprunner.com/claims",
"onchain_ws_url": "wss://test.finney.opentensor.ai:443",
"onchain_contract": "5G1fuA6RPVCUu7K5ep7SWJLzQaqzdJAwchQHppkfVKzEVv49",
"onchain_metadata_path": "/home/mekaneeky/repos/Pool/new_merkle/app/assets/merklepool.json",
"research_coordinator_endpoint": "https://chvp2wytst.eu-central-1.awsapprunner.com",
"research_agent_mode": "gui_managed",
"research_agent_command": "bash -lc 'cat {intro_path_quoted} /home/mekaneeky/repos/SN94-BitSota/docs/guides/autoresearch-agent-master-prompt.md | codex exec --skip-git-repo-check --full-auto -C {repo_dir_quoted} --add-dir {workspace_dir_quoted} -o {submission_result_path_quoted} -'"
}
Claude Code example:
{
"research_agent_mode": "gui_managed",
"research_agent_command": "bash -lc 'cat {intro_path_quoted} /home/mekaneeky/repos/SN94-BitSota/docs/guides/autoresearch-agent-master-prompt.md | claude code --dangerously-skip-permissions -C {repo_dir_quoted} > {submission_result_path_quoted}'"
}
Hermes example:
{
"research_agent_mode": "gui_managed",
"research_agent_command": "bash -lc 'cat {intro_path_quoted} /home/mekaneeky/repos/SN94-BitSota/docs/guides/autoresearch-agent-master-prompt.md | hermes -C {repo_dir_quoted} > {submission_result_path_quoted}'"
}
You can also launch a prepared live testnet GUI setup with:
cd /home/mekaneeky/repos/SN94-BitSota
scripts/run_live_testnet_research_guis.sh <wallet_name>:<hotkey_name>
What to verify in the GUI path:
- the GUI loads real coordinator tasks, not fallback template cards
- pool mining starts against a live task
- the external agent writes
submission.json,agent.stdout.txt, andagent.stderr.txt - the coordinator records a submission
- validation accepts the submission
- the claim service later exposes a Merkle package
- the claim package includes the expected
recipient_coldkey - the claim client submits successfully against that published recipient
Planner-Driven Pool Path¶
If you want shared task assignment instead of each miner choosing work directly, run the planner and mine in pool mode.
Planner-driven flow:
- planner creates
work_items - pool miners claim those
work_items - miners run and submit against the claimed work item
- validator accepts or rejects the submission for
standardorcenterless - accepted results flow through reward publication and Merkle claim
Recommended pairings:
standard+ direct claims + validatorcenterless+ planner work items + validatorpeer_evaluation+ direct or planner work items + no validator
Run the planner¶
Deterministic planner:
cd /home/mekaneeky/repos/autoresearch-bittensor
python -m venv .venv
source .venv/bin/activate
pip install -e .[test]
autoresearch-plan --once
Looping deterministic planner:
LLM-based agentic planner:
cd /home/mekaneeky/repos/autoresearch-bittensor
export PLANNER_LLM_BASE_URL=http://127.0.0.1:11434/v1
export PLANNER_LLM_MODEL=planner-model
autoresearch-plan-agentic --once
Looping agentic planner:
You can also trigger one planner pass through the coordinator admin API:
Deterministic:
curl -X POST https://chvp2wytst.eu-central-1.awsapprunner.com/api/v1/planner/run \
-H 'X-Admin-Token: <admin-token>'
Agentic:
curl -X POST https://chvp2wytst.eu-central-1.awsapprunner.com/api/v1/planner/run-agentic \
-H 'X-Admin-Token: <admin-token>'
Planner-created work items¶
Planner-created work items are an operator/testnet surface, not a current public miner onboarding path. Public miners should start from the live competition catalog and use manual mining or Codex-only mining.
What to verify in planner mode:
GET /api/v1/work-itemsreturns open work items for the task- miner claims a work item instead of a direct task claim
- submission is linked to the claimed work item
- work item moves to completed on successful submission
- follow-up work items appear if the planner creates them
Validator Step For LLM-Based Autoresearch¶
This is the part older docs under-described.
For standard and centerless tasks, you still need validator replay after the agent produces a patch and submission.json.
Public validator path¶
Public validators should use the SN94-BitSota checkout and talk to the live
coordinator over signed HTTP requests. They do not need autoresearch-bittensor
DB access, AWS credentials, DATABASE_URL, or ADMIN_TOKEN.
The validator hotkey must still be allowlisted by the live backend. The backend
checks X-Hotkey, X-Timestamp, and X-Signature; use the SN94 helper to build
those headers instead of hand-rolling them.
git clone https://github.com/AlveusLabs/SN94-BitSota.git
cd SN94-BitSota
git checkout testnet-net-gui-pool-agents
python -m venv .venv
source .venv/bin/activate
pip install -e .
Inspect pending submissions through the public API:
bitsota-research-agent signed-request \
--coordinator-url https://chvp2wytst.eu-central-1.awsapprunner.com \
--method GET \
--path /api/v1/submissions \
--params-json '{"status":"pending_verification"}' \
--wallet-name <validator_wallet> \
--wallet-hotkey <validator_hotkey>
Fetch the submission detail and task onboarding for the candidate you will replay:
bitsota-research-agent signed-request \
--coordinator-url https://chvp2wytst.eu-central-1.awsapprunner.com \
--method GET \
--path /api/v1/submissions/<submission_id>/detail \
--wallet-name <validator_wallet> \
--wallet-hotkey <validator_hotkey>
bitsota-research-agent signed-request \
--coordinator-url https://chvp2wytst.eu-central-1.awsapprunner.com \
--method GET \
--path /api/v1/tasks/<task_id>/onboard.md \
--wallet-name <validator_wallet> \
--wallet-hotkey <validator_hotkey>
Preferred unattended public validator path:
cp research_validator_config.yaml.example research_validator_config.yaml
# Edit coordinator_url, wallet_name, and wallet_hotkey.
python -m validator.research_validator_runner --config research_validator_config.yaml --once
For manual fallback, replay the submission in a clean local workspace using the
task repository, base ref, allowed patch paths, and benchmark instructions from
the submission detail and task onboarding output. Then record the observed
result through the legacy /verify route:
cat >/tmp/verify-submission.json <<'JSON'
{
"status": "accepted",
"observed_metrics": {
"heldout_quality": 0.0
},
"notes": "validator replay completed; replace this with task, commit, and benchmark summary",
"replay_log": "replace this with bounded replay output or failure summary"
}
JSON
bitsota-research-agent signed-request \
--coordinator-url https://chvp2wytst.eu-central-1.awsapprunner.com \
--method POST \
--path /api/v1/submissions/<submission_id>/verify \
--body-file /tmp/verify-submission.json \
--wallet-name <validator_wallet> \
--wallet-hotkey <validator_hotkey>
Use rejected or error instead of accepted when replay fails. Do not send
accepted from claimed miner metrics; the canonical score is the validator's
observed replay metric.
Public runner notes:
- The SN94 public runner is the shared-testnet path for validators that should not hold backend database or AWS credentials.
- The legacy signed
/verifypath remains useful for manual replay and for testing an older backend.
Backend-owned validator worker¶
Only backend operators with the autoresearch-bittensor checkout and database
credentials should run the coordinator-local worker:
cd /home/mekaneeky/repos/autoresearch-bittensor
autoresearch-validate \
--wallet-name <validator_wallet> \
--wallet-hotkey <validator_hotkey> \
--workspace-root ./data/validator-workspaces
This worker is not the public validator runbook for shared testnet operators because it reads and writes the coordinator database directly.
Peer-evaluation exception¶
Do not use /verify for peer_evaluation tasks.
Use:
bitsota-research-agent peer-evaluate-once ...- or
bitsota-research-agent loop --allow-peer-evaluation ...
The coordinator will reject /verify for those tasks.
Peer-evaluation task onboarding should be documented per live competition before it is exposed as a public miner path.
Reward And Claim Step¶
After a submission is accepted:
- the coordinator best result should update
- the reward snapshot should include the accepted competition state
- Pool should resolve the miner hotkey to a stored recipient coldkey
- Pool should publish the next Merkle epoch on rollover with that
recipient_coldkeyin theclaim_list - the claim API should serve a package for the miner hotkey that includes the same
recipient_coldkey - the GUI or local claim client should submit
claim_singleagainst the published recipient
Useful checks:
curl -fsS https://chvp2wytst.eu-central-1.awsapprunner.com/healthz
curl -fsS https://chvp2wytst.eu-central-1.awsapprunner.com/api/v1/tasks | jq
curl -fsS https://chvp2wytst.eu-central-1.awsapprunner.com/api/v1/submissions | jq
curl -fsS https://chvp2wytst.eu-central-1.awsapprunner.com/api/v1/verifications | jq
curl -fsS https://3fhi3ukpyw.eu-central-1.awsapprunner.com/claims/health
curl -fsS https://3fhi3ukpyw.eu-central-1.awsapprunner.com/claims/epochs | jq
curl -fsS https://3fhi3ukpyw.eu-central-1.awsapprunner.com/claims/epoch/<epoch>/claim/<hotkey> | jq
When inspecting a claim package, check these fields explicitly:
hotkey: miner identity used for lookup and reward attributionrecipient_coldkey: payout recipient published into the Merkle leafproof/amount_units/index: proof material forclaim_single
Success Criteria¶
The flow is only truly end-to-end when all of these happen:
- a live task is claimed
- a valid submission is created
- the submission becomes accepted through validator replay or peer consensus
- the task best result updates
- the next claim window publishes an epoch
- a claim package appears for the miner hotkey
- the claim package recipient matches the intended declared payout path
- the claim is submitted successfully
- no remaining claim packages exist for that hotkey
Common Gotchas¶
- Do not treat submission creation as end-to-end success.
- Do not hardcode task IDs across reseeds.
- Do not use fallback template cards as proof that the live coordinator path works.
- Do not check miner hotkey free balance as the reward success signal.
- Do not assume the GUI's locally declared coldkey automatically controls claim payout. Pool publication is the source of truth for each epoch.
- Do not assume
POST /coldkey_address/updateon the legacy relay path controls autoresearch Merkle claims. - If
/api/v1/validator/submissions/scanor/api/v1/submissions/{id}/verifyreturns503, validator allowlisting or validator deployment is wrong. - If accepted submissions never show up in
/claims/epochs, the reward publication side is broken even if coordinator validation is healthy.