WP-01 - AthDGC: An Open Diachronic Greek Treebank with Indo-European Parallels
Platform launch report - GlossaContactLab Working Papers, digital edition
AthDGC (“Athens-PROIEL”) is an open, end-to-end pipeline and dataset that provides the first continuously updated, dependency-parsed treebank of the entire Greek language - from Homeric and Archaic texts to Modern Greek - with cross-lingual alignment to Latin (Vulgate), Gothic (Wulfila), Old Church Slavonic (Marianus), and Classical Armenian via the verse-level cross-alignment of the New Testament. AthDGC follows the PROIEL XML 2.0 schema, uses the Stanford Stanza PROIEL-trained pipeline for annotation, and applies LaBSE sentence embeddings and multilingual-BERT attention for word-level alignment. As of v0.4, the corpus contains 89.9 M total rows, 4.08 M annotated Greek rows, and 6,861 NT-aligned Greek verses cross-aligned to four sister IE witnesses. The platform ships fourteen open-source modules under OSI-approved licences. This Working Paper is the canonical launch report for the AthDGC platform.
diachronic linguistics, ancient Greek, Byzantine Greek, Modern Greek, Indo-European, PROIEL, dependency treebank, cross-lingual alignment, argument structure, retranslation
Citation
Lavidas, N., Nikiforidou, K., Haug, D., Kulikov, L., Geka, V., Symeonidis, V., Michalareas, T., Chionidi, S., Tsiropina, A., Plakoutsi, E., Argyropoulos, E., & the Athens Digital Glossa Chronos Research Network (2026). AthDGC: An Open Diachronic Greek Treebank with Indo-European Parallels. GlossaContactLab Working Papers, digital edition, WP-01. National and Kapodistrian University of Athens (NKUA). 10.5281/zenodo.20439182.
@techreport{lavidas2026athdgc,
type = {GlossaContactLab Working Paper, digital edition},
number = {WP-01},
author = {Lavidas, Nikolaos and Nikiforidou, Kiki and Haug, Dag and Kulikov, Leonid and
Geka, Vassiliki and Symeonidis, Vassileios and Michalareas, Theodoros and
Chionidi, Sofia and Tsiropina, Anastasia and Plakoutsi, Eleni and
Argyropoulos, Evangelos and {Athens Digital Glossa Chronos Research Network}},
title = {{AthDGC}: An Open Diachronic Greek Treebank with Indo-European Parallels},
year = {2026},
institution = {National and Kapodistrian University of Athens (NKUA)},
doi = {10.5281/zenodo.20439182}
}Full paper
The full text of this Working Paper - method, dataset description, open-access source provenance per period and per IE parallel, IE-parallel matrix, retelling and retranslation chains, tool licence matrix, reuse potential, access + format, acknowledgements - lives at the AthDGC public site:
Read the full launch report at athdgc.github.io/data_paper.html
The same source .qmd renders to HTML, .docx, and (when TinyTeX permits) PDF. This Working Papers entry mirrors the canonical text.
Platform links
| Resource | URL |
|---|---|
| Public showcase | https://athdgc.github.io |
| Source code | https://github.com/AthDGC/Diachronic-Linguistics-Platform |
| Stanza checkpoints | https://huggingface.co/AthDGC (3 model repos live; weights at v0.5) |
| PyPI toolkit | https://pypi.org/project/athdgc-tools/ (stub 0.4.0.dev0; full v0.5) |
| Concept DOI | 10.5281/zenodo.20439182 |
| KEDIVIM autumn 2026 course | https://athdgc.github.io/training.html |
Team
Prof. Nikolaos Lavidas (PI, NKUA), Prof. Emerita Kiki Nikiforidou (NKUA; Co-Editor, Genres and Influential Texts volume), Prof. Dag Haug (Oslo; PROIEL Project Director), Prof. Leonid Kulikov (Ghent; Diachronic typology, valency questionnaires), Dr. Vassiliki Geka (NKUA; Post-Doctoral Researcher; Co-Editor, Genres and Influential Texts volume), Dr. Vassileios Symeonidis (NKUA; Post-Doctoral Researcher), Dr. Theodoros Michalareas (NKUA; Post-Doctoral Researcher), Sofia Chionidi (NKUA; PhD Candidate), Anastasia Tsiropina (NKUA; PhD Candidate), Eleni Plakoutsi (NKUA; PhD Candidate), Evangelos Argyropoulos (NKUA; Research Assistant); and the Athens Digital Glossa Chronos Research Network as collective author.
Acknowledgements
Funded by the Hellenic Foundation for Research and Innovation (HFRI) under the 3rd Call for HFRI Research Projects to support Post-Doctoral Researchers, Project No. 20577; with complementary support from the Greece 2.0 National Recovery and Resilience Plan. Compute supplied by GRNET ARIS (Greek national HPC), allocation pa260305. Funded project: CVL-CDSAML - A Corpus-based Valency Lexicon for a Contrastive and Diachronic Study of Ancient and Medieval Languages.