Introducing BEREL: BERT Embeddings for Rabbinic-Encoded Language

Avi Shmidman, Joshua Guedalia, Shaltiel Shmidman, Cheyn Shmuel Shmidman, Eli Handel, Moshe Koppel

Research output: Working paperPreprint

Abstract

We present a new pre-trained language model (PLM) for Rabbinic Hebrew, termed Berel (BERT Embeddings for Rabbinic-Encoded Language). Whilst other PLMs exist for processing Hebrew texts (e.g., HeBERT, AlephBert), they are all trained on modern Hebrew texts, which diverges substantially from Rabbinic Hebrew in terms of its lexicographical, morphological, syntactic and orthographic norms. We demonstrate the superiority of Berel on Rabbinic texts via a challenge set of Hebrew homographs. We release the new model and homograph challenge set for unrestricted use.
Original languageEnglish
DOIs
StatePublished - 3 Aug 2022

Keywords

  • cs.CL

Fingerprint

Dive into the research topics of 'Introducing BEREL: BERT Embeddings for Rabbinic-Encoded Language'. Together they form a unique fingerprint.

Cite this