AthDGC · Athens-PROIEL

An open diachronic Greek treebank with Indo-European parallels

Nikolaos Lavidas

National and Kapodistrian University of Athens (NKUA)

June 3, 2026

Athens Digital Glossa Chronos

The Athens Diachronic Glossa Chronos (Athens-PROIEL) is an open PROIEL-style dependency treebank spanning Greek diachrony with Indo-European parallels.

NKUA · Division of Language-Linguistics, Department of English Language and Literature, School of Philosophy

Funded by HFRI (Project No. 20577) · Greece 2.0 NRRP

Corpus + Tools

AthDGC is both a continuously-updated PROIEL XML 2.0 treebank and an open-source computational diachronic-linguistics toolkit:

  • LightSIDE-compatible feature extraction + classifier training
  • NoSketch-style concordancer
  • Fine-tuned Stanza checkpoints (Byzantine, Late-Byzantine, Modern adaptations)
  • Argument-structure extractor
  • Cross-lingual alignment viewer (Neo4j)
  • Quarto template pack for any DH project (this very deck is built with it)

The corpus

Covers:

  • Homeric and Archaic Greek (Homer, Hesiod, Sappho, Pindar)
  • Classical (Aeschylus, Sophocles, Euripides, Aristophanes, Plato, Aristotle)
  • Koine and NT (Plutarch, Strabo, Lucian, the New Testament)
  • Late Antique (Eusebius, Basil, Chrysostom, Procopius)
  • Byzantine (Psellos, Anna Komnene, John of Damascus, Niketas Choniates)

with NT verse-level cross-alignment to Latin (Vulgate), Gothic (Wulfila), and Old Church Slavonic (Marianus).

Current state (v0.4)

Metric Value
Total corpus rows 89.9 M
Annotated Greek rows 4.08 M
NT-aligned Greek verses 6,861
IE cross-aligned witnesses 4

Pipeline

  1. Discovery - daily harvest from archive.org, Perseus, First1K, Wikisource, Diorisis
  2. Filtering - Greek-script ratio, apparatus-criticus rejection, dedup
  3. Conversion - PROIEL XML 2.0
  4. Annotation - Stanza grc_proiel + analogous models per parallel language
  5. Argument-structure - subject, object, oblique, voice, aspect per verb
  6. Cross-alignment - LaBSE + AwesomeAlign (mBERT)
  7. Storage - PROIEL XML + JSONL + Qdrant + Neo4j

Scholarly focus

  • Retranslation of influential texts (Iliad, NT, Septuagint Psalms, classical historiography)
  • Retelling chains across periods and languages
  • Argument structure under retranslation

Team

  • Lavidas (PI) · Nikiforidou · Haug (Oslo) · Kulikov (Ghent)
  • Geka · Symeonidis · Michalareas
  • Chionidi · Tsiropina · Plakoutsi · Argyropoulos

NKUA · Athens Digital Glossa Chronos Research Network

How to cite

Lavidas, N., Nikiforidou, K., Haug, D., Kulikov, L., Geka, V., Symeonidis, V., Michalareas, T., Chionidi, S., Tsiropina, A., Plakoutsi, E., Argyropoulos, E., and the Athens Digital Glossa Chronos Research Network (2026). AthDGC: Athens Diachronic Glossa Chronos. Zenodo.

DOI: 10.5281/zenodo.20439182

Thank you

Funded by the Hellenic Foundation for Research and Innovation (HFRI) under the 3rd Call for HFRI Research Projects to support Post-Doctoral Researchers, Project No. 20577; with complementary support from the Greece 2.0 National Recovery and Resilience Plan.

Compute supplied by GRNET ARIS (Greek national HPC), allocation pa260305.