Every major AI training platform runs on quality scores. They determine your tier, your pay, your access to specialty pools, and ultimately whether you stay on the platform at all. Most contractors have only a vague sense of what affects them. Here's the practical mechanics.
How quality scores are calculated
Each platform has variations, but the underlying math is similar:
- Automated test scores. For coding tasks, your code is run against held-out tests. Pass rate becomes a quality input.
- Human reviewer scores. A senior contractor or platform reviewer rates a sample of your work on a rubric.
- Inter-annotator agreement. For RLHF, your scores are compared against consensus from other raters. High agreement = high quality.
- Time-on-task. Bizarrely fast or slow completion can flag tasks for review.
The composite is a number between 0.0 and 1.0. Most platforms want 0.85+ for senior tier, 0.92+ for specialty pools.
What actually moves your score up
- Reading prompts carefully. The single biggest variable. Most score drops trace to a missed constraint.
- Using the rubric exactly. Not your judgment — the platform's rubric.
- Justifying disagreements. When you score against consensus, write why. The system weights thoughtful justifications heavily.
- Saying no to tasks you're uncertain on. Skipping a task you'd score weak on protects your average more than completing it would help your volume.
- Picking long-form tasks over short ones. Long-form tasks weight more heavily, and they're easier to score well on with care.
What tanks your score
- Working tired. Quality scores drop measurably after the 3rd hour of focused work.
- Multi-tasking with other work. Reading carefully fails when you're context-switching.
- Bulk-completing tasks. Speed-running 30 tasks in 90 minutes almost always produces lower-quality outputs.
- Arguing the rubric. Score against the rubric, even when you disagree.
- Late-night or pre-meal sessions. Hunger and fatigue both drop reading attention.
The recovery strategy
Score drops happen. The right response:
- Stop work immediately. Don't try to push through a fatigue-driven drop.
- Take 24–48 hours off. Reset your attention.
- Return at half-rate. 5–10 tasks the first day back, all carefully done.
- Specifically prioritize easy, short tasks. Build positive score samples before re-attempting harder work.
Trying to "make it back" by working more usually compounds the problem. The contractors whose scores recovered tended to slow down, not speed up.
Platform-specific notes
Outlier
- Quality scores are published in your dashboard.
- Updated daily with a 14-day rolling average.
- Tier movement happens on weekly reviews.
Mercor
- Quality scores are not published directly; you see "rating bands" instead.
- Updates happen at undisclosed intervals (likely weekly).
- Tier reviews require explicit request after 60+ days at current tier.
Surge AI
- Quality is shown as percentile within your role peer group.
- Updated weekly.
- Specialty access requires sustained 80th+ percentile for 60 days.
The 0.95+ pattern
Contractors who consistently maintain >0.95 quality scores share these patterns:
- Strict 4-hour daily cap on deep work.
- Take 5-minute breaks every 30–45 minutes.
- Re-read prompts twice before writing.
- Skip 5–10% of available tasks they're not confident on.
- Specialize in 1–2 task types rather than spreading across all.
- Don't work hungry, tired, or distracted.
Bottom line
Quality scores are the master variable on AI training platforms. Reading carefully, sticking to the rubric, and saying no to tasks you're not confident on protect them better than any other tactic. The fastest way to senior tier is the slowest path: careful, sustained, quality-first work. See burnout prevention for managing the long-term sustainability of high quality.