AI resume screening — has anyone actually quantified bias or accuracy at scale?

Question

Our HRIS is pushing an AI resume-screening add-on. They cite "40% time savings" and "reduced bias." The bias claim makes me nervous (most of the bias literature on AI screening is negative) and the time-savings claim seems suspect.

For people who've deployed this: did you measure the actual accuracy vs. human screeners on the same job reqs? And how did you handle the EEOC / disparate-impact angle if HR challenged the model output?

AJ Jaghori · Answer

Disclosure: I'm at ADP, and our HRIS (Workforce Now) has AI-driven candidate scoring as a feature. So I have a vendor bias. I'll try to be straight about what the evidence actually shows.

The bias concern is real and well-documented. The most-cited research:

- The 2018 Amazon resume-screening AI that learned to penalize resumes containing the word 'women's' (e.g., 'women's chess club') was killed before deployment. That's a famous case but not isolated.
- NIST's 2023 AI Risk Management Framework specifically flags hiring AI as high-risk for disparate impact.
- The EEOC's 2023 guidance on AI in hiring confirms that disparate-impact analysis applies to AI-driven screens — meaning Title VII liability transfers to whoever deployed the model, NOT the vendor who built it.

The 'reduced bias' marketing claim usually rests on the fact that the AI doesn't see name/photo/gender directly. That's a real but narrow protection. The model can still learn protected-class proxies (zip code, college, employment-gap patterns) and produce disparate impact.

On the time-savings claim: the 40% number is roughly consistent with what I've seen in production deployments at 200-1000 EE. Net of the legal/compliance overhead, the real number is more like 20-30%. Worth it for high-volume recruiting (1000+ apps/month), often not worth it for selective hiring (under 100 apps/month).

What I've seen work in production:

1. Audit the model against your own data before going live. Pull 6-12 months of past hires. Run their resumes through the AI screen. Confirm the AI would have surfaced the same people you actually hired (or better candidates you might have missed). If the AI's recommendations diverge significantly from your historical hires AND skew on a protected dimension, the model's not safe to deploy.

2. Disparate-impact testing on every job req. Most vendors won't do this for you. You need to either pay a consulting firm (~$15-30K to do a baseline audit) or have an internal data analyst run the four-fifths rule against the AI's output by gender, race, and age band. If selection rates differ by more than 20%, you have a problem.

3. Human-in-the-loop on every adverse action. The AI surfaces a ranked list; a human makes the final 'do not advance' call. This is both a legal protection AND an accuracy improvement — humans catch context the AI misses (career changers, non-traditional paths, etc.).

4. Annual model retraining + retesting. Job markets shift. A model trained in 2024 will produce different (potentially biased) results on 2026 applicant pools. Build the retest cadence into your annual compliance calendar.

5. Document everything. When the EEOC or a plaintiff's attorney asks how you made the screening decision, 'the AI flagged it' is not a defense. 'Here's the model audit, here's the disparate-impact test from June, here's the human reviewer who advanced this candidate, here's why this candidate was not advanced' is.

Where AI screening genuinely shines: structured high-volume entry-level hiring (customer service, retail, warehouse). Pattern-matching on credentials, certifications, prior similar roles. The risk surface is narrower and the accuracy bar is lower.

Where I'd be skeptical: any role where culture-fit, judgment, or non-linear career paths matter. Engineering, sales, leadership. The AI penalizes the patterns you want to find — the unconventional candidates who turn out to be your best hires.

— AJ

AI resume screening — has anyone actually quantified bias or accuracy at scale?

1 Answer

Your answer