Frontier AI labs pay $80–$140/hr for contractors who can evaluate and produce reference solutions for advanced math problems. The role is officially called "math reasoning specialist" and unofficially called "math tutor for AI." Here's the real picture for 2026.
What the work involves
Three task types make up most of the role:
- Reference solution writing. The model is given a math problem; you write a clear, step-by-step solution that becomes a training target. 30–90 minutes per problem.
- Solution evaluation. The model generates a solution; you verify each step, flag errors, and score on a rubric. 10–25 minutes per task.
- Adversarial probing. You write problems designed to test specific failure modes (multi-step reasoning, problems with red-herring information, problems requiring care with units). 20–60 minutes per problem.
The work spans high-school algebra to graduate-level real analysis depending on the project. Most ongoing work is at the late-undergraduate to graduate level — enough depth that solving requires thought, not enough that it requires lifelong specialization.
Who pays for math reasoning
- Mercor: The main hub for math specialty work. Pays $80–$130/hr depending on tier.
- Outlier (math specialty pools): $70–$110/hr; access requires senior tier first.
- Surge AI: $75–$110/hr for math specialty work.
- Direct lab engagements: $130–$200/hr for top contributors. Anthropic, DeepMind, and OpenAI all run math reasoning programs.
What qualifies you
The acceptable backgrounds are wider than people assume:
- PhD in math or related field (physics, theoretical CS, statistics).
- Top finisher in math olympiad (IMO medals, Putnam top 100).
- Active university math instructor (especially at competitive institutions).
- Published research in mathematical reasoning, optimization, or related areas.
- Strong applied math credentials (quant finance, cryptography, theoretical ML) with verifiable proof.
Many strong undergraduates from competitive programs (MIT, Cambridge, Tsinghua, IIT) qualify even without a PhD if they have visible problem-solving credentials (research, olympiad, published work).
The application
Mercor and Outlier both run dedicated math intake. The application typically includes:
- Background verification. Education, published work, contest results.
- Math sample. 60–120 minute take-home with 3–5 problems at varying difficulty. You write reference solutions; reviewers check correctness, clarity, and pedagogical structure.
- Calibration interview. Walk through 2–3 problems live with a senior reviewer, including handling problems where the "obvious" answer is wrong.
The bar is meaningfully higher than for general coding tracks. Acceptance rates for math reasoning specialty are around 25–35% of applicants who already qualify on credentials.
The work day-to-day
Common patterns from contractors in math specialty:
- Hours are bursty. 15–25 hrs/week during active campaigns, 0–8 hrs between.
- Tasks are slow but high-paying. $90/hr × 60 min/problem = $90 per problem. Volume matters less than reliability.
- Quality scoring is strict. A single mistake in a reference solution can drop your tier. Most successful contractors verify every step before submitting.
- The work isn't really "tutoring." No live students. You're producing reference materials and evaluating outputs, not teaching.
Math + coding combo
The single most lucrative combination on AI training platforms in 2026 is math reasoning + production coding. Contractors who can both write rigorous math solutions and evaluate code routinely pull $130+/hr in mixed engagements. If you have both, lead your profile with the combination — Mercor specifically has a "quantitative engineer" track that targets this background.
Bottom line
Math reasoning specialist is among the highest-paid roles available to contractors without medical/legal credentials. The bar is real but not unreachable for math-strong applicants. See where math reasoning ranks against other top-paying roles.