Detecting pseudepigraphic texts using novel similarity measures

Moshe Koppel, Shachar Seidman

Research output: Contribution to journalArticlepeer-review

Abstract

The identification of pseudepigraphic texts-texts not written by the authors to which they are attributed-has important historical, forensic, and commercial applications. Any method for identifying such pseudepigrapha must ultimately depend on some measure of a given document's similarity to the other documents in a corpus. We show that for this purpose, second-order document similarity measures taken from the authorship verification literature strongly outperform standard document similarity measures commonly used for outlier identification. We apply these improved methods to two famous corpora suspected of including pseudepigrapha: Shakespeare's plays and Pauline epistles.

Original languageEnglish
Pages (from-to)72-81
Number of pages10
JournalDigital Scholarship in the Humanities
Volume33
Issue number1
DOIs
StatePublished - 1 Apr 2018

All Science Journal Classification (ASJC) codes

  • Information Systems
  • Language and Linguistics
  • Linguistics and Language
  • Computer Science Applications

Fingerprint

Dive into the research topics of 'Detecting pseudepigraphic texts using novel similarity measures'. Together they form a unique fingerprint.

Cite this