How artificial intelligence has answered the UnGoogleable exam question

I started my UnGoogleable exam questions blogs (part 1 and part 2) by saying the robots have not won yet. A lot can change in a year and artificial intelligence (AI) has already found a way to beat most UnGoogleable assessments. If the old game was stopping answers from being looked up, the new game is stopping them from being answered by AIs. Platforms, such as ChatGPT, enable students to answer online exam questions, write essays, complete some MCQs, summarise papers and produce reflections for skills portfolios. (It can even write poetry, if you need that in your life!)

Don’t believe me? Let’s ask the AI to tell you itself:

“AI systems can potentially be used to answer online exams, although this would depend on the specific nature of the exam and the capabilities of the AI system. Some exams may be easier for AI to answer than others, depending on the complexity of the content and the types of questions being asked. If the exam is open-book and allows the use of external resources, an AI system with access to a comprehensive knowledge base and advanced natural language processing capabilities might be able to answer some questions. However, if the exam is closed-book or requires higher-order thinking or problem-solving skills, it is less likely that an AI system would be able to provide accurate answers. Additionally, the use of AI to answer exams could potentially be detected by exam proctors or by the exam software itself and could lead to penalties or other consequences.”

If the AI was a student, what grade would it get? I tried a range of written assessment-style questions in ChatGPT and marked them against my assessment criteria. I then cross-checked the mark with other academics, effectively double-marking and moderating.

Traditional exam/essay style questions ask students to draw on knowledge and discuss against a title. They are a staple of both exams and essays, e.g.:

Discuss the evidence linking mitochondrial dysfunction to neurodegenerative diseases.
Compare data-dependent acquisition and data-independent acquisition for the analysis of proteomic data.
Write a description of gene editing by CRISPR that would be understandable to an A-level Biology student.

The AI’s answers had a reasonable level of specific knowledge in them, the information was correct and multiple points were brought together. However, the text was vague in the way it was discussed and lacked depth in understanding. Given that this AI was trained with information up to 2021, current thinking was missing. Nevertheless, had its answers been presented in a time-limited online exam, I would happily have given them a low to mid 2:1. As a coursework essay, it would get a 2:2. WOW!

Short answer questions probe knowledge and understanding but don’t always draw on analysis skills. They are found in exams and workbooks, e.g.:

State how a fluorescently tagged protein can be introduced into a mammalian cell line.
Write a short 100-word summary of the paper “Engagement with video content in the blended classroom” by Smith and Francis (2022).

In these more direct recall or summary questions, the answers were fully correct – all the details were present and would have been graded at high 2:1 to 1st level. The AI does a really good job of reporting back on existing knowledge, it can even answer most MCQs.

Problem-solving questions give students a situation to apply knowledge and develop a solution. They are designed such that the student needs to draw on what they know against a prompt. Here is an example from a recent two-part exam question where the student was asked to design a workflow and predict outcomes:

Describe a series of experiments to show that the induction of stress granules correlates with the activation of the Integrated Stress Response (ISR).
The drug ISRIB is an ISR inhibiting molecule as it binds and promotes eIF2B activity. Describe the effect ISRIB treatment would have on stress granule formation if cells were exposed to oxidative stress and ISRIB. Discuss in your answer what impact this drug would have on the experiments described in part A.

The AI could write a decent experimental plan against a prompt and develop a valid response for part B, about what it had written in part A. The answers were again unfocused in places, and some of the information was not correctly applied or fully appropriate, but it would still easily have gained a 2:2 or low 2:1.

When a similar style of problem-solving question required a more interpretive element, such as using an image as a prompt or a rationale as to why the approach was appropriate, the AI fell over and was unable to answer. Without the text-based context, it had no means by which to work.

Reflections assessments simply take a reflective learning exercise and use it as a tool to assess the learning of the learner.

When the AI was asked to write a reflective task with the prompt “write a reflection on lab work”, it drew on the generic skills both from personal development and employability which one would gain in that environment. The answer failed to come up with any personal examples though, next steps or future action planning, so it lacked creativity. However, again, it would still grade well.

Surely AI answers are easily detected?

The AI that I played with (ChatGPT) had a specific writing style that set it apart from the other student scripts I read. Spelling and grammatical errors were low, so if that was a notable change from someone’s past writing, your suspicions would be raised. However, with anonymous marking (which is good inclusive practice), and the volume of scripts that are typically marked, you would not spot it. A key red flag though is that any references created by the AI were random and not real papers.

In order to check how effective plagiarism detectors such as Turnitin and Grammarly would be, I ran the same question through the AI ten times. Although you would expect multiple answers to the same question to generate matching text, or to match a pre-existing source, the answers created by the AI each time were worded differently. When I put those ten responses through Turnitin, only two showed text matching above 30%, my normal flag for having a deeper look. So, even our go-to tool for academic integrity did not detect anything amiss!

What is the future for written assessments then?

Each of the assessment prompts used returned viable answers but they were vague, lacked depth and were limited in creative aspects. However, all would achieve a good grade close to the class average. The AI that I worked with could not complete any assessment requiring human subjective judgement, such as ethical or moral assessments. AI also could not complete assessments that require creativity or intuition, as these require a level of human cognition that their systems are not yet capable of.

You cannot stop the use of AI, the genie is out of the bottle and it’s only going to get smarter. While assessments that require physical actions or manipulation of objects (such as practical work), or where the individual is probed or questioned on their understanding (such as vivas or poster presentations) are all potential workarounds, they are not always possible or appropriate. We can try to fight the use of AI but it is better to think more deeply about what the new purpose of assessment should be and what role AI plays. I feel another blog coming on…

Assessment guides us
Gauge the student progress made
For better learning
[An AI generated haiku about the purpose of assessment, with a little bit of help]

David's adventures in the classroom.

a place to share my teaching practice

How artificial intelligence has answered the UnGoogleable exam question

Surely AI answers are easily detected?

What is the future for written assessments then?

2 thoughts on “How artificial intelligence has answered the UnGoogleable exam question”

Leave a comment Cancel reply

Surely AI answers are easily detected?

What is the future for written assessments then?

Share this:

2 thoughts on “How artificial intelligence has answered the UnGoogleable exam question”

Leave a comment Cancel reply