PROBLEM OWNERS

Problem Posting Requirements

A problem is postable only when validators can replay it and miners can understand the work surface.

Current Status¶

Problem posting is currently operator/admin-gated.

The backend route is:

POST /api/v1/tasks

It requires:

X-Admin-Token

Public miners can list, claim, and submit against live tasks. Public problem owners do not yet have a self-serve posting flow in this repo.

Required Problem Package¶

Every new task needs:

Requirement	Why it exists
Clear objective	Miners need to know what progress means.
Public repository	Miners and validators must clone the same benchmark pack.
Pinned base ref	Validator replay must be reproducible.
Setup command	Optional, but required when dependencies or assets need preparation.
Benchmark command	Required; validators run it to produce the canonical metric.
Result contract	`result_path` or parseable benchmark output must expose the metric.
Allowed patch paths	Required; validator rejects patches outside this surface.
Metric name and direction	Required; examples: `heldout_ppl` minimize, `speedup` maximize.
Time budget	Required; backend accepts `1` to `86400` seconds.
Onboarding markdown	Required in practice; this is what miners and agents read first.

The backend enforces non-empty allowed_patch_paths.

Task API Fields¶

Field	Required	Notes
`slug`	yes	`3` to `120` chars; must be unique.
`title`	yes	`3` to `255` chars.
`brief`	no	Long problem statement, up to `20000` chars.
`onboard_md`	no, expected	Miner/agent instructions, up to `40000` chars.
`repository`	yes	Public clone URL.
`base_ref`	yes	Branch, tag, or commit ref to start from.
`setup_command`	no	Runs before benchmark.
`benchmark_command`	yes	Validator replay command.
`result_path`	no	JSON result file path if the benchmark writes one.
`allowed_patch_paths`	yes	Up to `64` paths/globs; must not be empty.
`metric_name`	yes	Metric key used for acceptance/best result.
`metric_direction`	yes	`minimize` or `maximize`.
`ranking_mode`	no	`scalar` by default; `pareto` requires secondary metric fields.
`secondary_metric_name`	only for Pareto	Secondary ranking key.
`secondary_metric_direction`	only for Pareto	`minimize` or `maximize`.
`competition_mode`	no	`centerless` default; also supports `standard`, `peer_evaluation`.
`min_peer_evaluations`	no	`1` to `20`; used for peer-evaluation tasks.
`time_budget_seconds`	yes	Validator replay budget.

Submission Surface¶

Tasks can accept:

patch-first submissions;
artifact-first submissions;
patch plus artifact metadata.

Every submission must include either a non-empty patch or an artifact_uri.

Artifact submissions must use validator-downloadable public locations such as public HTTPS or Hugging Face URLs. The coordinator stores:

artifact_uri
artifact_sha256
artifact_size_bytes

The coordinator does not store artifact bytes.

Validation Requirements¶

Before a task should go live:

benchmark replay must run from a clean clone;
hidden/heldout data must not be committed to the public task repo;
benchmark output must include the configured metric;
allowed_patch_paths must cover exactly the intended miner edit surface;
artifact-first tasks must document artifact format, integrity fields, and any size/parameter budget;
onboarding must explain what to change, how to run a local smoke test, and what a valid submission.json looks like.

Competition Modes¶

Mode	Use it when
`standard`	Normal claim -> submit -> validator replay.
`centerless`	You want idea sharing; submissions include `proposed_idea`, and later miners may implement prior ideas.
`peer_evaluation`	Miner peer consensus is the acceptance mechanism instead of validator replay.

Use standard unless the problem actually needs centerless idea rewards or peer evaluation.

Admin API Example¶

curl -X POST "$BITSOTA_COORDINATOR_URL/api/v1/tasks" \
  -H "Content-Type: application/json" \
  -H "X-Admin-Token: $BITSOTA_ADMIN_TOKEN" \
  -d '{
    "slug": "example-replayable-task",
    "title": "Example Replayable Task",
    "brief": "Objective, scoring, constraints, and accepted submission shape.",
    "onboard_md": "Miner-facing setup and submission instructions.",
    "repository": "https://github.com/example/task-repo.git",
    "base_ref": "main",
    "setup_command": "python3 prepare.py",
    "benchmark_command": "python3 benchmark.py",
    "result_path": "last_run.json",
    "allowed_patch_paths": ["train.py"],
    "metric_name": "heldout_ppl",
    "metric_direction": "minimize",
    "ranking_mode": "scalar",
    "competition_mode": "standard",
    "min_peer_evaluations": 2,
    "time_budget_seconds": 21600
  }'

Do not put admin tokens, private datasets, mnemonics, or validator secrets in task repos or docs.

Preflight Checklist¶

A clean clone can run setup and benchmark.
The metric is emitted deterministically enough for validators to compare.
The public task repo contains no private heldout data.
The task can be explained by onboard.md without private operator context.
The allowed patch surface is narrow.
Artifact rules are explicit if artifacts are accepted.
The task mode is justified.
The reward and claim path is understood by the operator.

Future Posting Flow¶

Self-serve problem posting is a product roadmap item. Until that exists, problem owners should treat posting as an operator-assisted process.