AthDGC - system demonstration

Open PROIEL-style treebank for the entire Greek language

Nikolaos Lavidas

NKUA · Athens Digital Glossa Chronos Research Network

June 30, 2026

AthDGC

Open PROIEL-style dependency treebank - Homer to Modern Greek - with NT verse-level cross-alignment to Latin / Gothic / Old Church Slavonic.

20-minute LREC-COLING system demo · Athens-PROIEL

Demo agenda (20 min)

  1. 3 min - Overview + funding context
  2. 6 min - Live browse of the showcase: PROIEL tree + arg-structure cards
  3. 5 min - Live cross-alignment query (Neo4j) across Greek/Latin/Gothic
  4. 3 min - Editor workflow: flag-for-review toolbar + corpus-side fix pipeline
  5. 3 min - Q&A

Live demo step 1 - browse

Open https://athdgc.github.io in the browser.

Select Classical period tab. Click into a Plato tlg0059.tlg022 sample.

Observe:

  • Greek sentence in Cardo / Old Standard TT
  • PROIEL dependency tree (root at top, daughters indented)
  • Per-verb argument-structure card (subj / obj / oblique / voice / aspect)
  • Raw PROIEL XML (collapsible)
  • Mark-for-review toolbar (state persisted to localStorage)

Live demo step 2 - cross-alignment

Open the public Neo4j endpoint (https://athdgc.github.io/graph).

Run:

MATCH (gk:Token {language:"grc", upos:"VERB"})-[:TRANSLATED_AS]->(la:Token {language:"lat"})
WHERE gk.feats CONTAINS "Voice=Act" AND gk.feats CONTAINS "Aspect=Perf"
  AND la.feats CONTAINS "Voice=Pass"
RETURN gk.form, gk.lemma, la.form, la.lemma LIMIT 25

Returns Greek aorist active verbs whose Vulgate counterpart is passive.

Live demo step 3 - editor workflow

In the showcase: click Mark for review on a problematic sample.

The flag is stored in localStorage and persists across reloads. Export current review state via the console:

athdgcExportReview()

This JSON is then handed to the corpus-side fix_corpus_data.py which applies bulk corrections on ARIS.

Architecture

Layer Implementation
Discovery 43_discover_sources.py - daily cron on ARIS
Filtering 44_filter_candidates.py - Greek-script + apparatus filters
OCR / Conversion 45_ocr_candidates.sbatch + 46_to_proiel_xml.py
Annotation 47_annotate_inbox.sbatch - Stanza on A100 GPU
Merge 48_merge_xml_to_corpus.py - JSONL partitioning
Corpus fix fix_corpus_data.py - TLG author override + Stanza error fixes
Showcase build 51_build_showcase_site.py - regenerates HTML from JSONL
Deploy deploy_athdgc_v2.ps1 - scp + git push

Reuse

  • Per-period JSONL partitions for fine-tuning Greek diachronic transformers
  • PROIEL XML 2.0 export compatible with Oslo PROIEL tools
  • Neo4j alignment graph for cross-linguistic queries
  • Public Hugging Face dataset (forthcoming)

Open + cite

  • Source code: github.com/AthDGC/Diachronic-Linguistics-Platform (Apache-2.0)
  • Dataset: 10.5281/zenodo.20439182 (CC-BY-4.0)

Lavidas, N., Nikiforidou, K., Haug, D., Kulikov, L., Geka, V., Symeonidis, V., Michalareas, T., Chionidi, S., Tsiropina, A., Plakoutsi, E., Argyropoulos, E., and the Athens Digital Glossa Chronos Research Network (2026). AthDGC: Athens Diachronic Glossa Chronos. Zenodo.

Thanks

Funded by HFRI (Project No. 20577) · Greece 2.0 NRRP Compute: GRNET ARIS (pa260305)

athdgc.github.io 10.5281/zenodo.20439182

Funding

Funded by the Hellenic Foundation for Research and Innovation (HFRI) under the 3rd Call for HFRI Research Projects to support Post-Doctoral Researchers, Project No. 20577; with complementary support from the Greece 2.0 National Recovery and Resilience Plan. Compute supplied by GRNET ARIS (Greek national HPC), allocation pa260305.