In a landmark 2011 study of Israeli parole boards, Shai Danziger and colleagues found that judges granted parole to approximately 65% of cases heard at the start of the day, but that approval rate declined to nearly zero by late morning — before recovering briefly after a food break and declining again through the afternoon. The judges weren't making worse decisions because the cases were harder. They were making worse decisions because decision-making is cognitively expensive, and the resource depletes.

Hiring managers conducting back-to-back phone screens are running the same experiment on your candidate pipeline, every week.

What Decision Fatigue Looks Like in a Screening Context

The Danziger parole study (published in the Proceedings of the National Academy of Sciences) has become the reference point for decision fatigue research, but it is far from alone. Ariely and colleagues have documented systematic degradation in decision quality in medical, financial, and legal contexts as the number of consecutive decisions increases.

In a hiring context, the mechanics play out differently from a binary parole decision, but the underlying cognitive load is the same. A recruiter conducting five unstructured phone screens in a morning is making dozens of micro-judgements per call: Is this person communicating clearly? Do their answers track the question? How does their experience compare to candidate three? Are they genuinely interested, or are they distracted?

By the fourth or fifth call, these judgements are not being made fresh. They are being made by someone whose prefrontal cortex — the part of the brain responsible for analytical reasoning and impulse control — is running on a depleted resource. Research from the field of cognitive psychology consistently shows that people in this state shift toward simpler heuristics: who seems most confident, who had the nicest voice, who reminded them of someone they liked.

This is not a failure of character. It is a predictable consequence of asking humans to make complex, high-stakes evaluations under volume and time pressure.

The Three Bias Patterns That Emerge Under Fatigue

1. The Halo Effect in Early Interviews

The halo effect — where a strong first impression inflates subsequent ratings across all competencies — is well-documented in selection research. What is less often noted is that the halo effect is strongest for the first and last candidates in a sequence, and weakest for those evaluated in the middle.

When a recruiter does four screens in a row and the first candidate was highly polished, candidates two, three, and four are being evaluated against a mental comparison that is not visible in any scorecard. Candidate two's adequate-but-not-exceptional answer gets rated lower than it would have if the sequence started differently. The scorecard doesn't capture this. The decision is distorted anyway.

2. Recency Bias Favours the Last Person Interviewed

The mirror problem to the halo effect is recency bias: when a hiring team discusses candidates after a batch of interviews, the most recently interviewed candidate is most available in memory. The details of candidates two and three have faded. Candidate five — who went last — gets a disproportionate share of mental bandwidth in the debrief, for reasons that have nothing to do with their actual suitability for the role.

Research by Carlson and colleagues on interview panel dynamics documented this pattern specifically in employment contexts: later-position candidates in interview sequences receive higher ratings on average, even when randomisation controls for actual candidate quality.

3. Cognitive Shortcutting Toward "Gut Feel"

When deliberate, evidence-based reasoning is cognitively expensive and the resource is depleted, people revert to System 1 thinking — the fast, associative, pattern-matching mode described by Daniel Kahneman in Thinking, Fast and Slow. In hiring, this expresses as "I just had a good feeling about them" or "Something didn't sit right."

These impressions are not worthless — experienced recruiters develop genuine pattern recognition over time. But they are highly vulnerable to bias when they substitute for evidence rather than supplement it. A "gut feeling" based on 30 minutes of unstructured conversation after five previous calls is not the same as a calibrated assessment. It is a tired brain reaching for a shortcut.

The Cost of Fatigue-Driven Screening Decisions

Three things happen when fatigue-driven decisions dominate the screening process:

1. Good candidates in "middle" slots are systematically underrated. If your screening day runs 9am to 5pm with six slots, candidates at 11am and 2pm are at the highest risk of being evaluated unfairly. This is not random noise — it is systematic exclusion of a subset of your candidate pool based purely on scheduling.

2. The decision criteria shift invisibly. If you ask recruiters what they are evaluating, they will tell you the competencies in the job spec. But what they are actually weighting after interview four is often something else: energy level, ease of conversation, confidence. These proxies correlate imperfectly with job performance and strongly with irrelevant factors including gender presentation, accent, and socioeconomic confidence markers.

3. Debrief discussions are dominated by the freshest and most vivid impressions, not the best-evidenced ones. The candidate who made one memorable joke at the end of their call may win the internal conversation, not because they were the strongest candidate, but because they left a distinctive mental fingerprint.

How Moving Screening to AI-Assisted First Rounds Preserves Human Judgment

The purpose of AI-assisted first-round screening is not to replace human judgment. It is to deploy human judgment where it is most valuable — in evaluated, high-signal conversations with a smaller, better-qualified pool — rather than dissipating it across the high-volume, low-signal first-round funnel.

When a recruiter reviews 15 AI-generated scorecards instead of conducting 15 phone screens, three things change:

The cognitive load is lower. Reviewing structured evidence — verbatim quotes, competency scores, rubric-anchored assessments — is faster and less draining than producing that evidence in real time during a live call. The recruiter is reading, not performing.

The evidence is consistent. Every candidate was asked the same questions against the same rubric. There is no halo effect from candidate one contaminating the ratings for candidate two, because the ratings were generated by a system that doesn't carry forward mood or memory from one call to the next.

Human judgment is concentrated at the right decision point. By the time a recruiter is on a video call or in-person interview with a shortlisted candidate, they have already read structured evidence about that candidate's communication, reasoning, and motivation. The live interaction can go deeper, probe inconsistencies, and assess the dimensions that require genuine human observation — rather than starting from zero with a candidate they've never evaluated.

A Practical Redesign for Hiring Teams

If your current process has recruiters conducting more than three to four live screening calls per day, the process itself is the problem — not the recruiters.

A more sustainable design:

Move first-round qualification to async AI-assisted interviews. Every applicant who meets basic eligibility criteria gets an identical, structured first-round experience. No scheduling required.
Set recruiter review time, not recruiter interview time. A recruiter reviewing 20 scorecards in a focused 90-minute block is doing high-quality work. The same recruiter conducting 20 phone screens across a day is doing degraded work by screen eight.
Reserve live interview slots for the shortlist. When a human conversation happens, it is with a candidate for whom structured evidence already exists. The conversation can focus on the dimensions that only humans can assess — cultural contribution, depth of reasoning, genuine motivation signals that require back-and-forth.
Structure the debrief. Competency-by-competency discussion from evidence, not "who did everyone like best." This applies whether or not AI-assisted screening is in place.

Key Takeaways

Decision fatigue is a documented cognitive phenomenon, not a perception problem. The quality of hiring judgments degrades predictably across a day of back-to-back screening calls.
Three specific bias patterns emerge under fatigue: the halo effect (candidates early in a sequence), recency bias (candidates late in a sequence), and cognitive shortcutting toward gut-feel proxies.
Moving first-round screening to AI-assisted interviews does not eliminate human judgment — it concentrates it where the signal is highest, in conversations with a filtered, better-qualified shortlist.
The most underrated benefit of AI-assisted first rounds is not speed. It is the consistent quality of human judgment applied at the stages where it actually matters.

If you're a hiring manager running more than 20 screens a week, the design of your process may be producing systematically worse decisions by Thursday than it produces on Monday. The Voxxhire demo shows how structured AI-assisted first rounds change that equation.

Interview Fatigue Is Real — And It's Costing You Good Hires