Abstract
The identification of pseudepigraphic texts-texts not written by the authors to which they are attributed-has important historical, forensic, and commercial applications. Any method for identifying such pseudepigrapha must ultimately depend on some measure of a given document's similarity to the other documents in a corpus. We show that for this purpose, second-order document similarity measures taken from the authorship verification literature strongly outperform standard document similarity measures commonly used for outlier identification. We apply these improved methods to two famous corpora suspected of including pseudepigrapha: Shakespeare's plays and Pauline epistles.
Original language | English |
---|---|
Pages (from-to) | 72-81 |
Number of pages | 10 |
Journal | Digital Scholarship in the Humanities |
Volume | 33 |
Issue number | 1 |
DOIs | |
State | Published - 1 Apr 2018 |
All Science Journal Classification (ASJC) codes
- Information Systems
- Language and Linguistics
- Linguistics and Language
- Computer Science Applications