AthDGC at CIDL 2026

Civis Diachronic Linguistics · Athens · CIVIS BIP

Nikolaos Lavidas

NKUA · Athens Digital Glossa Chronos Research Network

June 15, 2026

AthDGC - Athens-PROIEL

Diachronic Greek treebank · Indo-European cross-lingual alignment Athens Digital Glossa Chronos Research Network

HFRI Project No. 20577 · Greece 2.0 NRRP · CVL-CDSAML

Why this matters at CIDL

Diachronic linguistics needs:

  • continuous coverage across periods
  • aligned witnesses across languages
  • uniform syntactic annotation
  • machine-queryable argument structure

No single existing resource provides all four. AthDGC does.

Coverage (v0.4)

Period Years Annotated Greek rows
Archaic 8th–6th c. BC Homer, Hesiod, Pindar, Sappho
Classical 5th–4th c. BC Tragedy + comedy + history + philosophy + oratory
Koine 3rd c. BC – 4th c. AD New Testament, Plutarch, Strabo, Lucian
Late Antique 4th–7th c. AD Eusebius, Basil, Chrysostom, Procopius
Byzantine 7th–12th c. AD Psellos, Anna Komnene, John of Damascus, Niketas
Late Byzantine 13th–15th c. AD in ingestion
Early Modern 16th–18th c. AD in ingestion
Modern 19th c. – present in ingestion

Cross-lingual alignment

NT verse-level cross-alignment to:

  • Latin (Vulgate) - 6,861 verses aligned
  • Gothic (Wulfila) - in ingestion
  • Old Church Slavonic (Marianus) - in ingestion
  • Classical Armenian - PROIEL-style pipeline in development

Method: LaBSE sentence embedding + AwesomeAlign word-level (mBERT attention).

Argument structure

For every VERB / AUX token AthDGC extracts:

  • subject (sub, including raised xobj / nonsub patterns)
  • direct object (obj)
  • indirect / oblique args (iobj, obl)
  • vocative addressee (voc)
  • voice (active / middle / passive)
  • aspect (perfective / imperfective)
  • tense, mood, verb-form

Stored per row in a column accessible to graph queries over the cross-lingual alignment edges.

Example query

“Show every Greek aorist transitive verb with an accusative object whose Latin Vulgate counterpart is a passive periphrastic”

Translates to a Neo4j Cypher query traversing the (Token)-[TRANSLATED_AS]->(Token) edges where source carries voice=Act,aspect=Perf,obj=Acc and target carries voice=Pass,verb_form=Periphrastic.

Open infrastructure

Team

PI: Prof. Nikolaos Lavidas (NKUA) International: Prof. Dag Haug (Oslo, PROIEL Project Director); Prof. Leonid Kulikov (Ghent; Diachronic typology, valency questionnaires) NKUA: Prof. Emerita Kiki Nikiforidou; Dr. Vassiliki Geka, Dr. Vassileios Symeonidis, Dr. Theodoros Michalareas (Post-Doctoral Researchers); Sofia Chionidi, Anastasia Tsiropina, Eleni Plakoutsi (PhD Candidates); Evangelos Argyropoulos (Research Assistant); Athens Digital Glossa Chronos Research Network (collective author)

Funded by HFRI (Project No. 20577) · Greece 2.0 NRRP Compute: GRNET ARIS (pa260305)

Discussion

What CIDL participants can do with AthDGC today:

  1. Query the corpus via the public Neo4j endpoint
  2. Pull period-specific JSONL partitions from Hugging Face
  3. Browse the PROIEL XML 2.0 samples on athdgc.github.io
  4. Flag annotation errors via the showcase review toolbar
  5. Cite version-pinned datasets via the version DOIs

Thank you

athdgc.github.io · github.com/AthDGC 10.5281/zenodo.20439182

Funding

Funded by the Hellenic Foundation for Research and Innovation (HFRI) under the 3rd Call for HFRI Research Projects to support Post-Doctoral Researchers, Project No. 20577; with complementary support from the Greece 2.0 National Recovery and Resilience Plan. Compute supplied by GRNET ARIS (Greek national HPC), allocation pa260305.