Learnings from using AI agents in oral assessments

March 18

1. AI agents make oral and hybrid assessments scalable.

2. Agents expose students to the productive discomfort of real-time thinking. In this environment, students develop cognitive-communication skills that are becoming increasingly important in an AI-first world.

3. Agents can conduct oral and hybrid assessments. The latter (students submit written work, then orally defend it) allows us to change variables, challenge students' views intellectually, and probe how much agency and ownership they have over the work they submit.

4. The strongest signal of learning often appears in moments of resistance — when a student revises a mental model, responds to a constraint, or defends their decision under pressure.

5. Agents enable entirely new learning experiences (simulations or role-plays). Students said that agents "stick to their roles" and "don't judge," which gave them greater confidence to participate and engage.

6. Agents help us mitigate some of the cognitive and social biases found in human-led oral exams by adding consistency to the interview conditions while allowing us to focus on interpreting the evidence (transcript).

7. Agents do not replace human judgment. They improve data collection, consistency, and scale, while humans remain responsible for interpretation, standards, and final evaluation.

8. Agents can also be used to provide formative feedback conversationally, which requires students to demonstrate transfer skills and metacognition.


But also …


9. There is hardly any research in the area of agentic assessment. We need to better understand how performance anxiety affects assessment outcomes or how over-indexing verbal fluency might disadvantage students with ADHD, among other things.

10. Meaningful agentic assessments are the result of great (and relevant) playbooks. Weak scenario design can produce shallow or misleading evidence.

11. There are legitimate accessibility concerns. Students with speech, hearing, neurodivergence or processing differences may be disadvantaged unless we find ways to carefully accommodate their learning styles and preferences.

12. Agents are much harder to audit than they seem. We built an entire infrastructure needed to design, build, and audit agents. Plus, if the playbook, model behavior and escalation rules are not documented, decisions may be hard to defend. Data and privacy concerns are fairly easy to handle.