Know what you're applying for.
Every AI training contract type, defined in plain English. What you do, what you need to know, what platforms hire for it, and what it pays in 2026.
RLHF Annotator
Reinforcement Learning from Human Feedback
What it is. RLHF Annotators rank, score, and write feedback on responses generated by large language models. Your work directly shapes how the next version of GPT, Claude, Gemini, and others reasons.
What you actually do. Open a task. Read 2–4 model responses to a prompt. Pick the best one and explain why. Sometimes write the "ideal" response yourself. Repeat 30–80 times per session. Sessions are 30–90 minutes; you control the schedule.
What you need to know. Strong written English (or target language). Ability to spot subtle reasoning errors, factual mistakes, and unhelpful framings. No coding required for general RLHF; coding-specific RLHF (see Coding Evaluator) pays more.
How to get hired. Mercor and Scale AI are the most accessible entry points — quick application, sample task, decision in 1–3 days. Outlier requires more screening but pays slightly more for senior tier work.
Find RLHF jobs on joblet.ai →Coding Evaluator
Code review for AI training data
What it is. Coding Evaluators review code generated by AI models, rate correctness and quality, fix mistakes, and produce reference solutions. The work trains the next generation of code-completion and code-generation systems.
What you actually do. Read a coding prompt and 1–4 candidate solutions. Run them mentally (or in a sandbox), find bugs, evaluate code quality, write a corrected reference solution if all candidates fail. Submit feedback. Move on.
What you need to know. Working professional fluency in at least one language — Python is most in-demand, followed by TypeScript/JavaScript, then Rust, C++, Go, Java. Reading-comprehension matters more than typing speed; you spend more time evaluating than writing.
How to get hired. Outlier is the volume leader — fastest application, most consistent work. Scale AI pays slightly more at the senior tier. Mercor is best for niche languages (Rust, C++) where rates climb 30–50%.
Find coding eval jobs on joblet.ai →Math Reasoning Expert
PhD-level mathematics for AI training
What it is. Write rigorous step-by-step solutions to advanced math problems. Evaluate AI-generated proofs and rank by mathematical correctness. Help models learn what airtight mathematical reasoning looks like.
What you actually do. Open a problem (calculus, linear algebra, real analysis, abstract algebra, number theory, topology). Either write a clean proof yourself, or evaluate 2–4 model attempts and explain exactly where each goes wrong. Rate by correctness, completeness, and clarity.
What you need to know. PhD or in-progress PhD in mathematics, applied math, statistics, or theoretical CS. Comfort with proofs at undergraduate-textbook-and-above level. Some platforms accept MS + research experience.
How to get hired. Scale AI's math expert program is the highest-paying. Outlier has higher volume. Turing is best for long-form proof writing where you can spend an hour on a single problem.
Find math reasoning jobs on joblet.ai →Domain Expert
Medical, Legal, Finance, Engineering specialists
What it is. Domain Experts evaluate AI responses on subject-matter accuracy in their specialty — medical, legal, finance, engineering, or scientific research. The bar is your expertise; the work is verifying that AI doesn't hallucinate when stakes are real.
What you actually do. Read a model response to a domain-specific question. Mark factual errors. Flag malpractice (medical), misinformation (legal), or material inaccuracies (finance). Either rewrite the response correctly or explain exactly what's wrong and why.
What you need to know. Active credentials in your field. Medical: MD/DO + active license + board cert pays $120–200/hr. Legal: JD + 5+ yrs practice. Finance: CPA, CFA, or equivalent. Engineering: PE license adds 30%.
How to get hired. Surge AI pays the highest for medical (premium for active practice). Mercor is best for legal — they have the largest law-domain inventory. Micro1 for finance. Most domain expert applications include credential verification.
Find domain expert jobs on joblet.ai →Multilingual Annotator
Translation, evaluation, and cultural QA
What it is. Native or near-native speakers of a language other than English evaluate AI responses for accuracy, fluency, and cultural appropriateness. Translation, idiom verification, and cultural-context flagging.
What you actually do. Read AI-generated text in your language. Score for grammatical correctness, natural fluency, cultural appropriateness, and factual accuracy. Sometimes translate or transcribe; sometimes write reference responses.
What you need to know. Native or near-native fluency in the target language plus working English. Some platforms require formal translation training; most don't. Languages with smaller speaker populations (Yoruba, Tamil, Vietnamese, Thai) command higher rates due to scarcity.
How to get hired. Scale AI has the largest volume across most languages. Surge AI pays a premium for CJK (Chinese, Japanese, Korean). Outlier hits highest pay for low-resource languages. Toloka is the most accessible entry point.
Find multilingual jobs on joblet.ai →Senior Software Engineer · Contract
Full FTE-equivalent contract engineering
What it is. This isn't AI training work — these are full-fledged contract engineering roles, often hands-on building production systems for AI-first companies, training infrastructure teams, or research labs needing senior engineers without committing to FTE.
What you actually do. Real engineering. Ship features, build infrastructure, debug production issues. Engagements are typically 3–6 months at 20–40 hrs/week. You're embedded with a team via Slack/Linear, attend standups, do code review.
What you need to know. 5–8+ years of professional engineering experience. Modern stack proficiency. Strong system design. The bar is closer to a Staff Engineer interview at a Series B than a typical contractor screen.
How to get hired. Turing has the most volume and the most rigorous screening (think 3-stage interview). Mercor pays a premium and has shorter engagements. Micro1 specializes in ML engineering contracts where pay tops $200/hr for hands-on training infra work.
Find senior engineer contracts on joblet.ai →AI Research Contractor
PhD-level research contributions
What it is. Contribute to AI research projects on a contract basis — paper reviews, dataset curation, evaluation methodology, ablation studies, sometimes co-authorship. Adjacent to traditional research-engineer roles but contract-shaped.
What you actually do. Varies wildly by project. Could be: review 20 papers and synthesize a survey. Design evaluation criteria for a new benchmark. Run ablation experiments on a published model. Help draft a paper. Engagements are typically 4–12 weeks.
What you need to know. PhD (in progress or completed) in ML, CS, statistics, or related field. Publication history at top-tier venues (NeurIPS, ICML, ACL, CVPR, ICLR) makes you 2–3x more attractive. Strong written communication for paper-shaped work.
How to get hired. Mercor and Turing are the two most active. Mercor screens for publications; Turing screens for hands-on experimentation experience. Both pay similarly at the senior tier; Turing tends to higher volume of work.
Find AI research contracts on joblet.ai →Creative Writing Trainer
Fiction, marketing, long-form narrative
What it is. Train language models to write better creative content. Evaluate AI-generated stories, marketing copy, scripts, or long-form essays. Sometimes write reference pieces yourself. The output trains models like ChatGPT and Claude on what good prose actually looks like.
What you actually do. Read AI-generated creative work. Score for craft elements: voice, pacing, character, plot consistency, originality, tone. Either fix the work in place or write a corrected reference. Long-form work (10K+ words) pays a premium.
What you need to know. Demonstrable writing portfolio. Published fiction or an MFA helps. Marketing-copy work cares about portfolio over credentials. Long-form work requires actual stamina to sustain a voice and structure across many thousands of words.
How to get hired. Outlier has the largest volume. Surge AI pays best for long-form (and screens hardest). Scale AI is most accessible for marketing-focused creative work. Portfolio review is universal — be ready to share 2–3 polished pieces.
Find creative writing jobs on joblet.ai →Ready to find your role?
All open roles across every platform live on joblet.ai.
Find your job →