Forty terms that come up in AI training contracting, defined in plain language. Bookmark for reference.
Foundational
RLHF (Reinforcement Learning from Human Feedback): Technique where models learn from human preferences over their outputs. The contractor's role is providing those preferences.
Evaluation (eval): Scoring an AI model's output against criteria. Different from training — eval measures, training updates the model.
Rubric: The structured criteria evaluators use to score outputs. Good rubrics produce consistent scores across raters.
Calibration: The process of aligning your judgment with platform standards via practice tasks scored against reference answers.
Tier: Your performance level on a platform (entry, mid, senior). Determines your pay rate and task pool access.
Tasks and roles
Coding evaluator: Reviews code for correctness, edge cases, style. Most common technical role.
RLHF annotator: Provides preferences over model outputs, typically pairwise (A or B better).
Domain expert: Credentialed evaluator (MD, JD, PhD, CFA) for specialty work.
Agent task evaluator: Scores AI agents that take multi-step actions (browse, code, call APIs).
Long-context evaluator: Evaluates model use of long documents (50+ pages) when answering questions.
Red teamer: Tries to make models fail or produce harmful outputs.
Reference solution writer: Writes ideal answers used as training ground truth.
Quality and scoring
Quality score: Numerical rating (typically 0–1) of your evaluation accuracy on a task.
Inter-rater agreement: How often two independent raters reach similar conclusions on the same task. Higher = better rubric.
Justification: Written explanation of your scoring rationale. Major score-dimension on most platforms.
Rolling weighted average: Your quality score computed across recent N tasks (typically 80–120). Determines tier eligibility.
Platform mechanics
Specialty pool: Gated task pool for specific work types (framework, niche language, domain). Pays premium rates.
Task drop: When a batch of tasks becomes available on the platform.
Pay cycle: Frequency of contractor payouts (Outlier weekly, Mercor bi-weekly).
Onboarding: Initial period combining application + calibration + first paid tasks.
Routing priority: How quickly you see new task drops; affected by tier and Outlier+ status.
Application process
Coding sample: Take-home programming test during application.
Work sample: Mercor/Surge written + coding test, longer than coding sample alone.
AI screener: Mercor's AI-conducted video interview.
Vetting interview: Turing's human-conducted technical interview.
Model concepts
Hallucination: Confident model output unsupported by source/knowledge/reasoning.
Pattern completion: Model treats similar things as identical (failure mode).
Constitutional AI: Approach where models are trained against specific principles. Specialty work category.
Multi-step reasoning: Tasks where models must chain multiple inferences.
Long-context: 1M+ token model contexts (small library of text).
Agent: AI model wrapped in a runtime that can take actions (browse, code, call APIs).
Compensation
Effective hourly rate: Gross earnings ÷ actual hours worked, accounting for unpaid wait time.
Specialty premium: Higher rate for specific specialty work (typically 10–40% above base).
Bounty: One-time payment for specific findings (red team programs).
Outlier+: Outlier's $39/month premium tier for priority routing.
Tax and payment
1099: US tax form for self-employed income.
Schedule C: US tax form for self-employment business income.
SEP-IRA / Solo 401(k): Self-employed retirement accounts with high contribution caps.
FIRC: Foreign Inward Remittance Certificate, India equivalent of foreign-income receipt.
Wise / Payoneer: Cross-border payment platforms commonly used by AI training contractors.
Career
Tier-up: Promotion to higher tier (entry → mid → senior).
Specialty calibration: Calibration tasks specific to a specialty pool, gating that pool's access.
Quality coach: Senior contractor who provides feedback in some specialty pools.
Program manager: Platform employee who runs specific contractor programs; can negotiate rates.
Bottom line
The vocabulary of AI training contracting is specific and worth learning. New contractors who use these terms accurately in applications and interviews signal experience. Senior contractors should be able to define each of these without thinking.