Writing Biometrics and Academic Integrity

The digital footprint of who is sitting in front of a computer today may be detected in more ways than one might expect. Beyond just user authentication, this includes, IP address, geolocation, internet service provider, internet cookies, ping time, and ability to complete complex tasks. Perhaps you have seen one of these before:

CAPTCHA, also reCAPTCHA software use a number of these strategies to determine if you are a bot pretending to be a human, or an actual human. These systems have become increasing more sophisticated as the developers of these bots are able to fool CAPTCHA tasks, such as selecting traffic lights in a set of images automatically without the need for a human.

In the academic space, this has hardly been an issue. Homework has been seen as a difficult enough task that a human is the only one able to perform it. In the case of mathematics, requiring that work be shown is sufficient evidence to prove that student performed the work – but this is of course no longer becoming the case.

In the onset of the Artificial Intelligence age, the ability to generate a fake video of a person, a fake voice, and computer generated text are all tasks that any person can perform from their home (or even on the go with only a smart phone). This explosion of content requires new levels of digital literacy and questions the very nature of authenticity and authorship. Creation isn’t the only AI trick either: computer vision can power the ability for apps and computers to ‘see’ text and even math equations and to translate, solve, or copy them instantaneously[1].

Generative AI—the ability of computers to generate media—now exists for anyone to generate unique essays and text-based materials that are both high quality (free of grammatical errors) and on topic/informative. In many circumstances, the combination of quality and originality make it hard to verify that an individual has created the work.

To combat this, companies like Cursive are dedicated to the construction of systems to combat false impersonations on the internet, using tools beyond just the ones used in CAPTCHA technology. It is expected that this will be an ongoing battle that has already persisted for decades in the world of fraud – the more advanced the impersonations become, the more advanced the detection approaches will become to prevent them.

In the academic space, there is more unique information available to verify a student, which means more variables can be considered. With the growth of the internet there are many threats to academic integrity: plagiarism of written works, assignments completed by someone other than the student themselves (also known as “contract cheating”), or in the most recent cases, written by automated deep learning computer text generation tools such as ChatGPT (though there are dozens of others available to students). As recently as January 2023, the New York City Department of Education has blocked access to text generation tools on the internet [2]. A select smaller number of schools are now requiring written works to be submitted via paper to dodge the negative effects of students relying on ideas that are not their own.

At Cursive, to prevent cheating, one of the technologies we employ prevents cheating by analyzing student behaviors when typing.  Keyloggers, a technology most associated with stealing passwords in the hacking community, have now come front and center as a way to prevent mischievous activity. In research, keyloggers have proven extremely accurate in identifying a unique individual as a new biometric fingerprint.

Even the slightest variation of a single key, as shown contributes to the unique identity of a person. Beyond this however we can also study the distributions of multi-key variations to get an even closer idea of a person’s style. The same technology can be used for detecting numerous other characteristics of a person while they are typing as well. Here we show the distribution of a person vs a population on their timing of pressing the ‘o’ key in an uncapitalized manner:

Keystroke variations do more than just imply who a person is – it can also be used to describe what they are doing. An independent study by Cursive, using 15 students demonstrated the ability to separated text that was copied from another location (either another browser window, a paper document, or previously within the same document), based on how a student typed. This was done by having all 15 of these student type both original works as well as copied works, and then from those key logs evaluating if a computer could tell the difference. Indeed, such models the ability to properly classify copied versus original written works more than 75% of the time. In addition to this, the same model developed can determine where in a body of text the copying occurred if it did.

To take things further, the field of stylometry also offers additional information to validate a user. By analyzing sequences of letters, word frequencies, and word combinations, there is information gain in determining a person’s identity based upon their writing style. One such famous example is a paper by Frederick Mosteller and David Wallace, published in 1964 which studied the Federalist Papers[3]. The authors discovered based on word frequencies alone, that some papers thought to have been authored by both Alexander Hamilton and James Madison together, were indeed entirely written by Madison. Similar techniques have been applied to the writings of Shakespeare in the question of whether he was sole author or the moniker of several authors. Such tactics, in combination with convolutional neural networks that exist now, are for more potent in their capabilities than those available at the time of this paper.

This type of approach, coupled with keylogging analysis, permits Cursive to not only detect instances of copied assignments, but if the suspect is another classmate, our technology is capable of identifying who that classmate is and making that information available to the teacher or administrator.

While also Cursive employs additional simplistic tactics, such as preventing students from copying and pasting, requiring authentication, encouraging TypeID in a secure environment, these advanced tools feed into our analytics to offer our users more information about cheating behavior in their class than is offered by any current competing technology in the academic integrity space.

Cursive aims to use as many of these as possible to make sure that our users are consistent across all assignments, and that the content is original in nature creating and maintaining trust in the classroom and protecting academic integrity.

The featured image was generated with the assistance of AI using the prompt “robot typing at a desk in the dark, pixel art.”

[1] News https://ktla.com/morning-news/technology/these-apps-let-you-take-a-picture-of-a-math-problem-to-solve-it-and-can-even-explain-the-solution/

Example: https://photomath.com/en

[2] https://www.nbcnews.com/tech/tech-news/new-york-city-public-schools-ban-chatgpt-devices-networks-rcna64446

[3] Frederick Mosteller and David L. Wallace. ‘Inference in an Authorship Problem.’ Journal of the American Statistical Association. Vol. 58, No. 302 (Jun., 1963), pp. 275-309. Published By: Taylor & Francis, Ltd.

negfrequency Avatar

Posted by

One response to “Writing Biometrics and Academic Integrity”

  1. […] nearly a decade of experience, and with help from our board of advisors and partners, Cursive has developed a new approach to prevent and detect contract cheating for written assignments. We have an initial product and […]

Leave a Reply