Abstract
Background: Despite significant advances in AI-driven medical diagnostics, the integration of large language models (LLMs) into psychiatric practice presents unique challenges. While LLMs demonstrate high accuracy in controlled settings, their performance in collaborative clinical environments remains unclear. This study examined whether LLMs exhibit conformity behavior under social pressure across different diagnostic certainty levels, with a particular focus on psychiatric assessment. Methods: Using an adapted Asch paradigm, we conducted a controlled trial examining GPT-4o’s performance across three domains representing increasing levels of diagnostic uncertainty: circle similarity judgments (high certainty), brain tumor identification (intermediate certainty), and psychiatric assessment using children’s drawings (high uncertainty). The study employed a 3 × 3 factorial design with three pressure conditions: no pressure, full pressure (five consecutive incorrect peer responses), and partial pressure (mixed correct and incorrect peer responses). We conducted 10 trials per condition combination (90 total observations), using standardized prompts and multiple-choice responses. The binomial test and chi-square analyses assessed performance differences across conditions. Results: Under no pressure, GPT-4o achieved 100% accuracy across all domains. Under full pressure, accuracy declined systematically with increasing diagnostic uncertainty: 50% in circle recognition, 40% in tumor identification, and 0% in psychiatric assessment. Partial pressure showed a similar pattern, with maintained accuracy in basic tasks (80% in circle recognition, 100% in tumor identification) but complete failure in psychiatric assessment (0%). All differences between no pressure and pressure conditions were statistically significant (P <.05), with the most severe effects observed in psychiatric assessment (χ²₁=16.20, P <.001). Conclusions: This study reveals that LLMs exhibit conformity patterns that intensify with diagnostic uncertainty, culminating in complete performance failure in psychiatric assessment under social pressure. These findings suggest that successful implementation of AI in psychiatry requires careful consideration of social dynamics and the inherent uncertainty in psychiatric diagnosis. Future research should validate these findings across different AI systems and diagnostic tools while developing strategies to maintain AI independence in clinical settings. Trial registration: Not applicable.
| Original language | American English |
|---|---|
| Article number | 478 |
| Journal | BMC Psychiatry |
| Volume | 25 |
| Issue number | 1 |
| DOIs | |
| State | Published - 12 May 2025 |
UN SDGs
This output contributes to the following UN Sustainable Development Goals (SDGs)
-
SDG 3 Good Health and Well-being
Keywords
- Clinical conformity
- Diagnostic uncertainty
- Large language models
- Psychiatric assessment
- Social influence
All Science Journal Classification (ASJC) codes
- Psychiatry and Mental health
Fingerprint
Dive into the research topics of 'A controlled trial examining large Language model conformity in psychiatric assessment using the Asch paradigm'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver