Eric on authorship attribution at 1:30 pm (Apr 28)

What defines an author’s writing style such that we can differentiate them from other authors? As humans, stylometric choices (i.e. the linguistic choice we make when writing) are innate, meaning we subconsciously make them without much thought. How can we build a system to automatically detect these stylistic choices in author’s documents, while also ignoring the content embedded within? Given a document, can we use these detected styles to automatically choose the correct author from a space of known authors? Additionally, we are concerned with why an author is chosen over the others, so explainability is an important factor.

In this talk, I will present my work on grammatical feature extraction and how we can generate grammatical “footprint” vectors. These vectors encapsulate author’s writing styles which can further be used for authorship identification, and later down the line, authorship obfuscation.