Are we prepared for the looming shadow of AI hallucination?

The King’s Institute for AI discusses the implications of growing GenAI use in education and the risks of false information disguised by eloquent writing.

The rapid integration of Generative artificial intelligence (GenAI) into education presents a double-edged sword. While its potential to personalise learning and enhance teaching is undeniable, concerns regarding ethical implications, accuracy, and student vulnerability to AI-generated misinformation are also rising.

This commentary delves into the current state of knowledge surrounding GenAI in education, highlighting the critical gap in understanding students’ ability to detect AI hallucinations defined in our research as responses generated from GenAI that contain incorrect information.

One particular concern for educators is to embrace GenAI in creating effective assessments and maintaining academic integrity.¹ A broader concern to society is GenAI’s tendency to generate false information that can often be masked under coherent and eloquent writing. If undetected, unverified, and unrectified, such false information can be inadvertently used or misused to various degrees of danger.

In this paper, we propose the first experimentation to study whether and how students in a top UK business school can detect false information created by GenAI, which is often defined as AI hallucinations, in a high-stake assessment context. While we constrain our paper within the educational context, it is highly relevant to the emerging research on identifying the key traits and socioeconomic factors underlying news readers in recognising false information and fake news.

Our setting presents a situation when readers (students) have abundant resources and training, as well as a vested interest, to investigate and evaluate the information (AI-generated response to an assessment question). We aim to shed light on the extent to which economics and business-related course educators can evaluate students’ academic performance considering the recent development in GenAI and proctored, in-person examination settings to be avoided.

Our evidence on students’ ability to detect incorrect information beyond cohesive and well-prosed responses from GenAI contributes to the scholarship of teaching and learning on AI literacy in the education setting.

A spectrum of attitudes

Research reveals a spectrum of attitudes towards GenAI in education. Studies from the UK, Norway, and Hong Kong indicate a generally optimistic outlook, with some reservations regarding accuracy, privacy, and ethical considerations.² However, a more cautious approach is evident in African contexts, where concerns about academic integrity and misuse of tools like ChatGPT prevail.³

Interestingly, American studies suggest individual differences significantly influence GenAI perception, with students exhibiting higher confidence and motivation being more likely to trust and engage with these tools.⁴ These findings emphasise the need for a nuanced approach, balancing innovation with ethical considerations and robust oversight mechanisms.

The risks of AI hallucination

AI hallucinations encompass various misleading outputs generated by large language models (LLMs) and pose significant risks. These fabricated responses can be ambiguous, making interpretation difficult. Additionally, potential biases inherent in training data can be inadvertently reproduced by AI, potentially exacerbating existing societal inequalities. Furthermore, fragmented and inconsistent information generated by LLMs can adversely impact online safety and public trust.

Mitigating the risks: Intervention vs caution

Two main approaches exist for mitigating AI-related concerns in education: intervention-based and a more cautious approach. Intervention involves implementing policies and fostering open discussions about AI use, promoting transparency and accountability among stakeholders. Additionally, reviewing training data is crucial for ensuring the integrity and reliability of AI outputs. Conversely, the cautious approach advocates for limiting or even refraining from using GenAI tools altogether. While intervention seeks to actively manage risks, the cautious approach prioritises complete avoidance, potentially hindering valuable practical educational applications.

The knowledge gap: Understanding student vulnerability

Existing research primarily focuses on the benefits and challenges of GenAI integration, neglecting to identify factors influencing students’ ability to detect factually inaccurate information. To address this gap, our research at King’s Business School employed a multi-pronged approach to assess students’ ability to identify AI hallucinations within a high-stakes economics assessment.

Assessment design

The assessment strategically incorporated AI-generated responses within a sub-question worth 15% of the assessment grade. To ensure focus on factual accuracy, explicit instructions directed students to evaluate the econometrics content of the AI response, excluding stylistic qualities.

Post-Course survey

Following the assessment, a post-course survey delved deeper into student attitudes towards AI. This survey employed a Likert scale (1-4: Strongly Disagree to Strongly Agree) on four key areas of AI hallucination and AI literacy.

Cohort Exposure

We randomly divided the student cohort into two equal groups. One group received information about the overall detection rate of AI hallucinations in the course, while the other group did not. This manipulation allowed us to investigate how student exposure to such information might influence their confidence levels regarding AI detection abilities.

Findings

Our preliminary findings reveal a fascinating interplay between academic skills, critical thinking, and student confidence in the context of AI detection.

Knowledge versus critical thinking: While strong academic performance in economics is a positive predictor of GenAI hallucination detection, mere knowledge application skills appear less effective. This suggests that critical thinking plays a crucial role in discerning factually inaccurate information from AI.
Cautious approach and critical thinking: Students exhibiting a cautious approach towards AI, coupled with strong critical thinking skills, demonstrated superior detection abilities. This highlights the importance of fostering a healthy scepticism when encountering AI-generated information.
Gender disparity: An interesting finding is the observed gender disparity in detection ability, necessitating further investigation to understand the underlying causes.

Confidence and peer performance

Students randomly allocated to the group being informed about their peers’ poor ability to identify AI hallucinations also become more cautious and negative about AI. The body of results suggests the need to make students aware of cases in which AI gives incorrect subject-related information to develop their critical views of AI.

The broader implications: A call to action

This vulnerability has significant implications for student learning and academic integrity in an AI-powered future. To address this critical issue, we propose a multi-pronged approach:

Targeted interventions

Based on our research, we can identify student groups most susceptible to AI misinformation. By understanding factors like digital literacy and socio-economic background, we can develop targeted interventions to enhance critical thinking skills and equip these students with the tools to discern fact from fiction.

Assessment design for the AI age

Educators need to develop assessments that strategically integrate GenAI to evaluate students’ ability to critically evaluate information. This goes beyond simply identifying AI-generated content and delves into the deeper skill of analysing the factual accuracy of the presented information.

Promoting critical thinking skills

A robust educational foundation that emphasises critical thinking skills is essential. Equipping students with the ability to analyse sources, identify biases, and evaluate information credibility is paramount in the age of AI.

Equitable access to resources

It is crucial to ensure that all students have access to the resources and knowledge needed to thrive in an AI-powered future. This includes fostering digital literacy skills and providing opportunities to develop critical thinking through practical exercises that involve discerning AI-generated information.

Open discussions and transparency

Encouraging open discussions about AI’s limitations and potential pitfalls is vital. By fostering informed student perspectives and promoting transparency around AI use in education, we can empower students to become responsible users and discerning consumers of information.

By addressing these critical areas, we can ensure students are well-equipped to navigate the complexities of the AI-powered world. Our research serves as a springboard for further investigation into student vulnerability to AI misinformation. By working collaboratively, educators, researchers, and policymakers can develop effective strategies to prepare students for the challenges and opportunities that lie ahead.

Conclusion

We present evidence on the link between students’ traits and their ability to detect AI hallucinations in a high-stakes assessment setting. Pedagogically, we provide a practical assessment design to embed GenAI into the assessment and put students at the centre of this learning and assessment experience.

Instead of actively dissuading its usage, we flip the assessments by evaluating how students evaluate AI responses to a carefully chosen question that can clearly test students’ grasp of the subject material. We then show a rather startling result on the danger and prevalence of AI hallucinations. Only 20% of our student cohort can detect incorrect information in the AI response to a coursework question.

Moreover, only those with significantly stronger academic and writing skills are capable of distinguishing facts from false information. The power of GenAI’s eloquent writing cannot be overstated.

Second, we speak to the literature on equity in AI access and use. We show those students with better academic foundational background and writing skills, perhaps in a more privileged situation, at least academically, would be at an advantage when it comes to the use of AI: They can discern correct information from falseness.

Our results from linking exams and coursework suggest that the gap can be closed by providing active learning with timely feedback and correct information. We eventually find no difference in the final exam, a higher-stake setting, between students initially able to detect AI errors and those who cannot. Providing equitable materials seems to be key. Finally, we address the question about drivers behind students, potential users of AI, towards AI.

We emphasise the importance of information and exposure to critical views towards AI and its potential drawbacks: Once randomly exposed to the prevalence and poor performance of peers in detecting AI errors, students become less confident about AI, their understanding of AI hallucinations, and their own ability to detect them. Both groups prefer to have more training in these matters, highlighting the need for more exposure to the dangers of AI hallucinations in practice and education.

Our study is subject to several limitations. First, we do not wish to offer an alternative to a universally accepted and well-structured tool for evaluating the quality of outputs from LLMs, which remains a challenge. As such, we remain open to replicating the results using other questions in different disciplines and with different groups of students.

Second, our AI response was generated by ChatGPT 3.5 in August 2023, which may become obsolete soon. With the technology’s continually evolving nature, the quality of the AI response may improve, and their outputs may be more capable of incorporating nuances and greatly improving quality.

Our paper relies on the current version that is freely accessible to the public and may represent the most likely tool that students would use. Fourth, our cohort of Economics and Management students may represent a non-random selection of the general population, or in fact, students in general, while our post-course survey relies entirely on voluntary participation.

Finally, while we believe the results can be replicated with outputs from other LLMs, the proliferation of LLM tools presents a potential research opportunity to replicate and compare the quality of different LLM tools.

References

Moorhouse et al., 2023
Bright, J., Enock, F. E., Esnaashari, S., Francis, J., Hashem, Y., & Morgan, D. (2024). Generative AI is already widespread in the public sector. ArXiv Preprint ArXiv:2401.01291.
Sevnarayan, K., & Potter, M.-A. (2024). Generative Artificial Intelligence in distance education: Transformations, challenges, and impact on academic integrity and student voice. Journal of Applied Learning and Teaching, 7(1).
Amoozadeh, M., Daniels, D., Nam, D., Kumar, A., Chen, S., Hilton, M., Srinivasa Ragavan, S., & Alipour, M. A. (2024). Trust in Generative AI among students: An exploratory study. Proceedings of the 55th ACM Technical Symposium on Computer Science Education V. 1, 67–73.
Chan, C. K. Y., & Hu, W. (2023). Students’ voices on generative AI: Perceptions, benefits, and challenges in higher education. International Journal of Educational Technology in Higher Education, 20(1), 43.
Rasmussen, D., & Karlsen, T. (2023). Adopt or Abort? Mapping students’ and professors’ attitudes towards the use of generative AI in higher education.

Please note, this article will also appear in the 19th edition of our quarterly publication.

Are we prepared for the looming shadow of AI hallucination?