Research

Open datasets, open tools, open corpora released by the lab

AthDGC platform

The lab's flagship computational platform. PROIEL-XML 2.0 dependency-parsed treebank of the entire Greek language (Homeric through Modern), with verse-level cross-lingual alignment to four IE witnesses at v0.4 and five more queued at v0.7.

Item Status URL
Public showcase live https://athdgc.github.io
Source repository live https://github.com/AthDGC/Diachronic-Linguistics-Platform
Hugging Face mirror live (3 model repos) https://huggingface.co/AthDGC
PyPI package live (stub) https://pypi.org/project/athdgc-tools/
Concept DOI live (v0.4.0) 10.5281/zenodo.20439182

Open-source toolkit

Fourteen modules under OSI-approved licences. Highlights:

  • LightSIDE-AthDGC - LightSIDE fork for PROIEL syntactic features (dependency arcs, argument-structure frames, morphology bundles). BSD-3-Clause + Apache-2.0.
  • Fine-tuned Stanza checkpoints - grc_byz_proiel, grc_lbem_proiel, grc_mod_proiel for diachronic Greek. Apache-2.0. Hosted at https://huggingface.co/AthDGC.
  • PROIEL XML 2.0 validator (v0.5) - schema + relation-inventory linter. Apache-2.0.
  • Lavidasised style check (v0.5) - em-dash + AI-marker grep for repository PRs. Apache-2.0.
  • Quarto template pack - the multi-output Quarto pack that builds athdgc.github.io and this lab site. MIT.

Full module list: https://athdgc.github.io/tools.html.

Open-access corpus inputs

Every primary source text used by the lab is open-access (public domain, CC-BY, CC-BY-SA, or equivalent). Greek sources draw on Perseus Digital Library, Open Greek and Latin / First1K (Leipzig), SBL Greek NT, Tischendorf and Westcott-Hort, Rahlfs LXX via openscriptures.org, Papyri.info, Patrologia Graeca via Documenta Catholica Omnia, Bibliotheca Augustana, Anemi (UoC), and Wikisource el. IE parallels draw on Vulsearch + Latin Library, the Wulfila Project (University of Antwerp), TITUS (Frankfurt), Digilib Armenian, GRETIL (Goettingen), SARIT, TEAMS, the DOE corpus, and the National Library of Ukraine. Full per-period source map: https://athdgc.github.io/samples.html.

Working Papers

Open-access pre-prints + platform launch reports self-published as the GlossaContactLab Working Papers, digital edition.

Funding

Funded by HFRI Project No. 20577 + Greece 2.0 National Recovery and Resilience Plan. Compute on GRNET ARIS. Project: CVL-CDSAML.