Are calibration tasks paid on AI training platforms?

No, calibration tasks are unpaid. They serve as the platform's measurement of your baseline quality before assigning real paid work. Treat them as high-leverage rather than low-value.

How long does AI training calibration take?

Outlier: 2 hours. Mercor: 3–4 hours. Surge AI: 4 hours over 1–2 weeks. Turing: no formal calibration — first paid project serves the role.

What calibration score do you need to start at mid-tier?

Most platforms require 0.88+ on calibration to start at mid-tier. 0.78–0.87 starts at entry tier. Below 0.78 typically triggers extended onboarding or rejection.

Can you redo calibration if you score low?

Sometimes. Outlier allows one re-attempt after addressing the issues. Mercor and Surge typically don't allow re-calibration but you can earn your way up through paid task scores.

Calibration tasks: what AI training platforms actually test

Calibration tasks are the unpaid practice tasks every AI training platform requires before assigning real work. Most contractors treat them as a checkbox. They shouldn't — calibration scores determine your starting tier, your initial hour allocation, and how the system weights your future quality scores.

What calibration tasks actually do

Three things happen during calibration:

The platform measures your alignment with their rubric. Your scores against reference answers establish your baseline.
The platform identifies your blind spots. Patterns of disagreement get flagged and may route you away from certain task pools.
Your calibration score sets your starting tier. Strong calibration → mid-tier task pool from day one. Weak calibration → entry-tier with extended onboarding.

What platforms specifically test

Rubric literalism vs spirit

Tasks frequently include cases that test whether you follow the rubric strictly or interpret its spirit. The right answer depends on the platform — Outlier rewards literalism, Mercor and Surge reward spirit-of-rubric reasoning. Read the calibration instructions carefully for the cue.

Edge-case sensitivity

Calibration tasks systematically include edge cases the system knows raters miss: empty inputs, off-by-ones, null/undefined leakage, special characters. Catching these matters disproportionately.

Justification depth

Calibration justifications are graded harder than regular tasks. The system uses calibration to set your justification baseline — write tighter, more specific reasoning than you would on regular work.

Calibration → tier → incomeStrong calibration starts you at mid-tier (~$4,000/month) vs entry-tier ($2,200/month).

Open calculator →

The unpaid trap

Calibration is unpaid. Most contractors rush through it to get to paid work. This is the wrong move. The math:

Time invested in careful calibration: ~3–5 hours.
Income difference between starting at entry vs mid tier in first 90 days: ~$5,000.
Effective hourly value of calibration time: $1,000+/hour.

Treat calibration as the highest-paid work you'll do on the platform. The "unpaid" framing is misleading — it's the highest-leverage hours you'll spend.

What scores well

Read each task twice before answering. Calibration tasks are intentionally constructed to penalize skimming.
Write 80–150 word justifications. Even when not strictly required, the system grades them.
Flag ambiguity explicitly. "This question has two reasonable interpretations because X. I'm answering under interpretation Y." This single sentence earns higher scores than picking either interpretation silently.
Cite the source for code-related tasks. Reference the line numbers, the documentation, or the relevant standard library section.
Don't disagree with the obvious answer to seem smart. The system tests whether you know when consensus is correct. Disagreeing on calibration tasks where consensus is obviously right tanks your score.

How long calibration takes

Outlier: 3–5 unpaid practice tasks, ~2 hours total.
Mercor: 5–8 calibration tasks, ~3–4 hours total.
Surge AI: 5–10 tasks across 1–2 weeks, ~4 hours total.
Turing: No formal calibration; first paid project effectively serves the role.

Block out a quiet 4-hour window. Don't try to fit calibration into 30-minute slots — context-switching reduces your scores measurably.

What happens after calibration

Your calibration score directly maps to:

Starting tier: 0.88+ starts you at mid-tier. 0.78–0.87 starts at entry. Below 0.78 triggers extended onboarding or rejection.
Initial hour allocation: Higher calibration → priority access to task drops in your first weeks.
Specialty pool eligibility: Specialty calibration scores must hit 0.85+ to unlock specialty rates.

Bottom line

Calibration is the most leveraged unpaid work in AI training contracting. Spend the time to do it carefully — write thorough justifications, flag ambiguity, cite sources, don't fake disagreement. A strong calibration score adds thousands of dollars to your first 90 days of income.

Find AI training contractsAll open roles · 9 platforms · filter by rate and hours.

Find your job →

Calibration tasks what platforms actually test.

What calibration tasks actually do

What platforms specifically test

Rubric literalism vs spirit

Edge-case sensitivity

Justification depth

The unpaid trap

What scores well

How long calibration takes

What happens after calibration

Bottom line

Frequently asked questions

Calibration tasks what platforms actually test.

What calibration tasks actually do

What platforms specifically test

Rubric literalism vs spirit

Edge-case sensitivity

Justification depth

The unpaid trap

What scores well

How long calibration takes

What happens after calibration

Bottom line

Frequently asked questions

Related