The single biggest pay variable on AI training platforms in 2026 isn't your years of experience or your school. It's whether you have verifiable domain expertise in a specialty that frontier labs are training models on. The gap between generalist and specialist has widened to roughly 3–4x in the past two years.
The numbers, by domain
Average specialist hourly rates we've seen in 2026 (across Outlier, Mercor, Surge AI, Scale, Turing):
- Generalist coding evaluator: $40–$75/hr
- Generalist RLHF annotator: $25–$50/hr
- Math reasoning specialist (PhD or competition-grade): $80–$140/hr
- Medical RLHF (verified MD/RN/NP): $90–$160/hr
- Legal reasoning (verified JD): $80–$130/hr
- Quantitative finance: $100–$200/hr
- Niche programming (Rust, OCaml, Verilog, etc.): $85–$140/hr
- Native multilingual (low-resource languages): $45–$95/hr
Quant finance specialists out-earn generalist coding evaluators by ~3x at the senior level. Even RLHF (typically the lowest-paying category) flips to one of the highest-paying when paired with verified medical credentials.
Why the gap exists
Three reasons:
- The labels are higher-stakes. A wrong RLHF rating on a creative-writing task affects model style. A wrong medical RLHF rating could affect what an LLM tells a patient. Frontier labs invest more in getting medical labels right, so they pay more for medical raters.
- The supply pool is tiny. There are millions of programmers. There are roughly 1.4 million practicing US physicians, of whom maybe 1% are interested in side gigs. The rate adjusts to reflect scarcity.
- Verification costs. Confirming a JD or MD takes meaningful effort (license boards, sometimes letter from supervisor). Platforms eat that cost upfront and recoup through higher rates over time.
What "domain expert" actually means on these platforms
Domain expert isn't a self-declaration. It typically requires:
- Verifiable credential. MD, JD, PhD, CFA, CPA, etc. Some platforms accept industry-equivalent (5+ years at a quant fund counts for quant finance specialist track).
- Active practice or recent practice. A radiologist who hasn't practiced in 12 years gets paid less than one currently reading scans daily. Currency matters.
- Subspecialty depth. "Doctor" pays well; "pediatric oncologist" pays better; "pediatric oncologist with 15 years of clinical trial experience" pays best.
The fastest-growing specialty tracks in 2026
Where the new money is flowing:
- Long-context reasoning evaluation. Models can now process 1M+ token contexts; evaluating whether they actually use that context correctly requires raters who can hold a long argument in their head. Pay: $75–$120/hr regardless of domain.
- Agent task evaluation. Multi-step agents that browse, code, and execute. Eval requires understanding goal completion, side-effect avoidance, and recovery from failure. Pay: $80–$130/hr.
- Constitutional / safety RLHF. Edge cases — tricky requests where the model has to refuse politely without being preachy. Pay: $70–$110/hr; legal/policy background helps.
- Multilingual frontier evaluation. Native speakers of languages where current models are weakest (Tamil, Hausa, Marathi, Vietnamese, etc.). Pay: $45–$95/hr; depends heavily on language.
Positioning your specialty
If you have a domain — even one you don't think of as a "domain" — here's how to frame it:
For platforms with profile-based matching (Mercor, Surge):
- Lead your profile with the credential. Not buried in education; first line.
- List subspecialties as separate tags. "Medical → Cardiology → Interventional cardiology" matches more listings than just "Medical."
- Quantify recent practice. "Currently practicing — 30+ patient encounters/week" is stronger than just "practicing physician."
For platforms with sample-based matching (Outlier):
- Apply with your specialty as your primary skill, not as a secondary.
- If your specialty has a coding angle (e.g., quant finance + Python), apply to coding tracks first — those have larger pools and faster onboarding — then transfer to specialty tracks once you're inside.
What if you don't have a "domain"?
Most contractors do — they just don't think of it that way. A few things that count as domain depth:
- 5+ years in a specific framework or stack. "Senior Rust contractor" beats "general programmer" by 30–50% in rate.
- Native fluency in a non-English language. Even one not on the "low-resource" list can earn 10–20% premium for multilingual work.
- Hobby-level mastery in something the model needs. Competitive Math Olympiad, ACM ICPC, MTG strategy, chess engines — anything where you've trained yourself to reason carefully in a structured domain.
- Industry experience that intersects tech. Worked in pharma + can code? You're a candidate for medical-coding tracks.
How specialty work compares to generalist work day-to-day
- Hours-per-week: Lower. Specialty pools are smaller. Expect 8–14 hrs/wk.
- Time per task: Higher. Specialty tasks are usually 30 minutes to 2 hours, vs. 5–15 minutes for generalist.
- Quality bar: Higher. The platform is paying a premium and expects fewer mistakes.
- Onboarding: Longer. Credential verification often takes 2–4 weeks beyond standard onboarding.
Bottom line
Generalist AI training contracting is a solid side income at $40–$75/hr. Specialty AI training contracting is closer to a primary income at $80–$160/hr. The specialty bar isn't as high as people assume — credentials you already have probably qualify you for a specialty track if you position them correctly.
If you're stuck at generalist rates and have any domain in your background, spend an evening rewriting your profile around that specialty and reapply. The math on the rate jump usually pays for itself within the first month.