Research
Open datasets, open tools, open corpora released by the lab
AthDGC platform
The lab's flagship computational platform. PROIEL-XML 2.0 dependency-parsed treebank of the entire Greek language (Homeric through Modern), with verse-level cross-lingual alignment to four IE witnesses at v0.4 and five more queued at v0.7.
| Item | Status | URL |
|---|---|---|
| Public showcase | live | https://athdgc.github.io |
| Source repository | live | https://github.com/AthDGC/Diachronic-Linguistics-Platform |
| Hugging Face mirror | live (3 model repos) | https://huggingface.co/AthDGC |
| PyPI package | live (stub) | https://pypi.org/project/athdgc-tools/ |
| Concept DOI | live (v0.4.0) | 10.5281/zenodo.20439182 |
Open-source toolkit
Fourteen modules under OSI-approved licences. Highlights:
- LightSIDE-AthDGC - LightSIDE fork for PROIEL syntactic features (dependency arcs, argument-structure frames, morphology bundles). BSD-3-Clause + Apache-2.0.
- Fine-tuned Stanza checkpoints -
grc_byz_proiel,grc_lbem_proiel,grc_mod_proielfor diachronic Greek. Apache-2.0. Hosted at https://huggingface.co/AthDGC. - PROIEL XML 2.0 validator (v0.5) - schema + relation-inventory linter. Apache-2.0.
- Lavidasised style check (v0.5) - em-dash + AI-marker grep for repository PRs. Apache-2.0.
- Quarto template pack - the multi-output Quarto pack that builds athdgc.github.io and this lab site. MIT.
Full module list: https://athdgc.github.io/tools.html.
Open-access corpus inputs
Every primary source text used by the lab is open-access (public domain, CC-BY, CC-BY-SA, or equivalent). Greek sources draw on Perseus Digital Library, Open Greek and Latin / First1K (Leipzig), SBL Greek NT, Tischendorf and Westcott-Hort, Rahlfs LXX via openscriptures.org, Papyri.info, Patrologia Graeca via Documenta Catholica Omnia, Bibliotheca Augustana, Anemi (UoC), and Wikisource el. IE parallels draw on Vulsearch + Latin Library, the Wulfila Project (University of Antwerp), TITUS (Frankfurt), Digilib Armenian, GRETIL (Goettingen), SARIT, TEAMS, the DOE corpus, and the National Library of Ukraine. Full per-period source map: https://athdgc.github.io/samples.html.
Working Papers
Open-access pre-prints + platform launch reports self-published as the GlossaContactLab Working Papers, digital edition.
Funding
Funded by HFRI Project No. 20577 + Greece 2.0 National Recovery and Resilience Plan. Compute on GRNET ARIS. Project: CVL-CDSAML.