Top AI Detectors appear unreliable: Time to reimagine assignments

Results of an analysis into the effectiveness of 4 AI detectors revealed how inaccurate they are, depending on the way they are used.


TELC have undertaken another updated analysis of 4 well-known AI detectors using between 1000-to-2000-word essays in 6 different field areas. The 6-field-area essays were tested in 4 different ways, i.e. the essays created therefore were either:

  • Fully AI GPT-4
  • Mixed paragraph i.e. GPT-4 AI paragraph then human paragraph, repeated
  • Mixed sentence i.e. GPT-4 AI sentence then human sentence, repeated
  • Fully human

Topics for essays (see attachment):

Sample 1: Teaching and Learning in Clinical Practice
Sample 2: The Role of Financial Education in the Business World
Sample 3: Urban Design in Architecture
Sample 4: Why have populists become the reality of the 21st-century political arena?
Sample 5: Clinical Trials: Design, Endpoints and Interpretation of Outcomes
Sample 6: The Immune System: A Key Player in Evolutionary Success

Key Findings

  • Based on this sample, Turnitin AI detector appears only effective at detecting fully human or fully AI. Mixing paragraphs or sentences renders it ineffective. Also, we would like to stress, these are not examples of what Turnitin terms ‘false positives’, rather, they indicate the detector is simply inaccurate.
  • The other well-known AI detectors are really not reliable at all.

Furthermore, access to AI will be ubiquitous, for instance Microsoft will be launching Word+ Copilot (AI) and Google has announced Duet AI for Workspaces. Therefore, TELC will be providing regular updates on some new ideas for reimagining and restructuring assignments. Also, we will be sharing best faculty practice to incorporating AI into assignments.