Back to blog
June 22, 20268 min read

Async Voice vs. Live Video Interviews: A Data-Backed Comparison for Screening

async interviewsvideo interviewsvoice interviewscandidate experiencescreening comparison

Not all first-round screening formats are equal, and the differences matter more than most hiring teams realise. Completion rates vary by up to 40 percentage points across format types. Bias risk differs substantially between structured and unstructured approaches. Cost and recruiter time diverge by an order of magnitude. Choosing the right screening format for the right stage of the funnel is one of the higher-leverage decisions a talent acquisition team makes.

This article compares four formats — traditional phone screen, one-way video, live video, and async voice — across six dimensions: completion rate, candidate satisfaction, bias risk, recruiter time, data quality, and cost.

The Four Formats

Traditional phone screen: A recruiter calls the candidate at a scheduled time, conducts an unstructured or semi-structured conversation, and takes notes manually.

One-way video (asynchronous video): The candidate records video responses to pre-set questions at a time of their choosing. No live recruiter. Examples: HireVue, SparkHire, Spark Hire.

Live video (synchronous): A recruiter and candidate meet via video call at a scheduled time. Examples: Zoom, Microsoft Teams, Google Meet with structured rubrics.

Async voice (asynchronous voice): The candidate completes a voice-only interview at a time of their choosing, guided by an AI, against a structured competency rubric. No camera, no scheduling required.

The Comparison

Completion Rate

Traditional phone screen: Scheduling friction is the primary drop-off driver. Candidates who are currently employed cannot take calls during work hours; timezone coordination in GCC contexts can require multiple reschedule rounds. Published data from high-volume ATS platforms suggests that 15–25% of candidates who accept a phone screen invitation do not complete it, primarily due to no-shows and reschedules.

One-way video: Camera anxiety is the primary barrier. Multiple studies and platform-published reports put one-way video completion rates at 60–75% of invited candidates. The gap is explained by candidates who accept the invitation but abandon before submitting — often due to discomfort with on-camera self-presentation, technical issues with camera setup, or concerns about how they look. The abandonment rate is higher for candidates from cultural backgrounds where formal camera presentation is unfamiliar.

Live video: Similar scheduling friction to phone screens, without the no-show risk (both parties have committed to a specific time and can see each other). Completion rate is high for scheduled sessions — approximately 85–90% — but the scheduling process itself filters out candidates who cannot fit a synchronous slot.

Async voice: No scheduling required, no camera. The format is accessible on any mobile device in any environment. Published completion rates for mobile-accessible async voice platforms range from 80–92% of invited candidates — higher than one-way video by 15–30 percentage points, and higher than phone screens by 20–35 percentage points when scheduling friction is included. For India and GCC markets where candidates frequently apply from mobile devices in non-office environments, the voice-only format removes the single biggest technical barrier.

Candidate Satisfaction

Traditional phone screen: Satisfaction is highly variable and recruiter-dependent. A well-prepared recruiter conducting a structured call produces a positive experience; an unprepared recruiter asking generic questions produces a neutral-to-negative one. Post-screen satisfaction surveys from HR tech platforms show high variance — standard deviations often exceed mean scores.

One-way video: Satisfaction research from the I/O psychology literature shows a consistent pattern: candidates rate one-way video interviews as less fair and more anxiety-inducing than all other formats, including traditional phone screens. The absence of a live interlocutor, combined with the pressure of on-camera self-presentation, produces a significant subset of candidates who complete the interview but feel it was not a fair representation of their ability.

Live video: Generally high satisfaction, similar to in-person interviews. Candidates appreciate the real-time dialogue. The primary satisfaction driver is whether the interviewer was prepared and engaged — the format itself is not the source of dissatisfaction.

Async voice: Satisfaction research on voice-only async formats is newer, but available data suggests a distinct pattern from video: candidates report lower anxiety than one-way video (no camera, no appearance concern) and more consistency than phone screens (the same structured questions, clear process). Candidates who value predictability and preparation find the structured async format positive. A minority of candidates who prefer conversational, spontaneous interaction find it impersonal — a fair criticism that informs how the format should be positioned.

Bias Risk

Traditional phone screen: High bias risk. Unstructured questions produce inconsistent data. Research from Schmidt & Hunter's meta-analysis shows unstructured interviews predict job performance with a validity coefficient of approximately r = 0.38, compared to r = 0.51 for structured interviews. Bias vectors in phone screens include: accent and pronunciation (which correlate with national origin), perceived gender from voice, communication confidence (which correlates with socioeconomic background), and the recruiter's mood and decision fatigue state.

One-way video: Adds appearance-based bias on top of the audio dimensions. Research by Uhlmann and Cohen found that evaluators who could see candidates on video were more influenced by physical appearance, race-associated appearance, and gender presentation than those evaluating audio-only content. Video adds a bias vector; it does not reduce one.

Live video: Similar bias profile to phone screens, with the addition of appearance effects. Scheduling at fixed times also introduces availability bias — candidates with rigid daytime commitments (certain caregiving situations, certain employment types) are systematically disadvantaged.

Async voice: Lowest bias risk of the four formats, when properly structured. Every candidate is asked identical questions against an identical rubric, evaluated by a system that applies the same criteria consistently. The absence of video removes appearance-based bias. The async format removes scheduling-based availability bias. The residual bias risk is in the scoring rubric itself — poorly designed rubrics can encode cultural assumptions about what "good communication" sounds like. This is a design problem, not a format problem.

Recruiter Time

Traditional phone screen: 40–50 minutes per candidate (scheduling, call, note-taking). At 20 screens per hire, this is 13 to 17 recruiter-hours per role before the shortlist exists.

One-way video: Near-zero recruiter time for the interview itself. Review of a 15-minute video takes approximately 12–18 minutes, depending on how much re-watching is needed. Total per candidate: 12–20 minutes.

Live video: 20–30 minutes of recruiter preparation plus the interview itself plus note-taking. Similar total to phone screen with better output if a structured rubric is used.

Async voice: Near-zero recruiter time for the interview. Reviewing a structured scorecard with evidence quotes: 3–7 minutes per candidate. This is the lowest recruiter time of any format that produces structured evidence.

Data Quality

Traditional phone screen: Output is recruiter notes — subjective, variable quality, no verbatim record, impossible to compare across candidates or across recruiters.

One-way video: Output is the video recording — higher fidelity than notes, but reviewing is time-intensive, and most platforms do not produce structured competency scores without additional AI overlay.

Live video: Similar to phone screen output if no structured rubric is used. Significantly better if a structured scorecard is completed in real time — but this requires recruiter discipline under the cognitive load of conducting a live conversation.

Async voice: Output is a structured competency scorecard with verbatim evidence quotes, a full transcript, and audio recording. The data is standardised across all candidates, directly comparable, and auditable. This is the highest-quality structured data of the four formats.

Cost

| Format | Cost per screen (recruiter time + overhead) | |---|---| | Traditional phone screen | AED 48–72 (GCC) / INR 350–600 (India) | | One-way video | AED 8–15 platform cost per screen | | Live video | AED 40–65 (recruiter time + platform) | | Async voice | AED 6–12 platform cost per screen |

The Summary Comparison Table

| Dimension | Phone screen | One-way video | Live video | Async voice | |---|---|---|---|---| | Completion rate | 75–85% | 60–75% | 85–90% | 80–92% | | Candidate satisfaction | Variable | Below average | High | Above average | | Bias risk | High | High + appearance | High + appearance | Low (by design) | | Recruiter time/screen | 40–50 min | 12–20 min | 25–40 min | 3–7 min | | Data quality | Low | Medium | Medium | High | | Cost per screen | High | Low | High | Low |

What This Means for Screening Design

No single format is right for every stage. The practical implication of this data is a sequenced screening architecture:

  • Stage 1 (eligibility filter): Pre-screening form with knockout questions. No interview format required.
  • Stage 2 (competency screen): Async voice. High completion, low bias, low cost, high data quality. Handles the volume.
  • Stage 3 (human assessment): Live video or in-person, structured rubric. High satisfaction, high signal. Applied to a filtered, better-qualified shortlist.

The mistake most hiring teams make is treating live video or phone screens as their first-round format for all applicants — applying the high-cost, high-recruiter-time format to the full volume, rather than to the qualified subset.

Key Takeaways

  • One-way video's completion rate is 15–30 percentage points lower than async voice due to camera anxiety, particularly for candidates in markets where on-camera self-presentation is culturally unfamiliar.
  • Async voice removes appearance-based bias that affects both one-way and live video formats, while providing higher data quality than phone screens.
  • Recruiter time per screen drops from 40–50 minutes (phone) to 3–7 minutes (async voice scorecard review) — the single biggest efficiency gain in the screening process.
  • The optimal screening architecture sequences formats: async voice for volume, live video for the qualified shortlist. Applying live formats to all applicants wastes both recruiter time and candidate goodwill.

If you want to see what an async voice scorecard looks like compared to typical phone screen notes, the Voxxhire demo walks through both formats side by side.