Intertextuality in diachronic Greek

What AthDGC contributes; Symposium on Computational Methods for Intertextuality, Tel Aviv University, 22 June 2026

Nikolaos Lavidas

NKUA · Athens Digital Glossa Chronos Research Network

June 22, 2026

Intertextuality in diachronic Greek

What AthDGC contributes

Nikolaos Lavidas, NKUA Athens Digital Glossa Chronos Research Network Διαχρονία Γλώσσας :Χρόνος

HFRI Project No. 20577 · Greece 2.0 NRRP · Compute GRNET ARIS pa260305

Roadpath

  1. The premise. Historical linguistics works on texts, not speakers.
  2. What intertextuality means inside a diachronic-Greek setting.
  3. Three modes: retelling, retranslation, inner-textual citation.
  4. The hypothesis: parallel coexistent grammars in written contact (Lavidas 2021, Brill).
  5. How AthDGC operationalises the hypothesis: one PROIEL schema across 3000 years.
  6. Concrete platform, concrete numbers, three threads through the corpus.
  7. What this offers a computational-intertextuality programme.

The AthDGC team

Nikolaos Lavidas (PI, NKUA) · Kiki Nikiforidou (NKUA) Dag Haug (Oslo, PROIEL Director) · Leonid Kulikov (Ghent) Vassiliki Geka (NKUA) · Vassileios Symeonidis (NKUA) · Theodoros Michalareas (NKUA) Sofia Chionidi (NKUA) · Anastasia Tsiropina (NKUA) · Eleni Plakoutsi (NKUA) · Evangelos Argyropoulos (NKUA)

Three universities (NKUA · Oslo · Ghent); eleven researchers; one platform. HFRI Project No. 20577 · Greece 2.0 NRRP.

AthDGC is the Athens node of the PROIEL family (“Athens-PROIEL”). Dag Haug, founding director of PROIEL at Oslo, is a co-author and co-PI. We adopt the PROIEL XML 2.0 schema verbatim and we extend it diachronically.

1. The premise

Historical linguistics has no native speakers. It has only texts.

The only window onto Archaic, Classical, Koine, Late Antique, Byzantine, Late Byzantine, and Early Modern Greek is the written record. Every claim about the syntax of a period reduces to a claim about what its surviving texts permit and what they exclude.

2. Intertextuality in a diachronic-Greek setting

In a corpus that spans 3000 years on one language, intertextuality is not a stylistic ornament. It is the primary data-generating process.

The same canon is re-rendered by every generation:

  • the same Homeric verse is paraphrased by Byzantine schoolbooks;
  • the same Pauline epistle is retranslated into Modern Greek every century;
  • the same Septuagint formula is quoted, glossed, and inflected toward each period’s spoken grammar.

3. Three modes

Retelling. Same language, later century. Same story, new register. (Niketas Choniates retelling Homer in 12th c. Byzantine epitome.)

Retranslation. Same canon, target language shifts. (Hebrew Bible → Greek Septuagint → Latin Vulgate → Gothic Wulfila → OCS Marianus → Classical Armenian.)

Inner-textual citation. Surface re-use of one text inside another. (NT quoting LXX; Patristic homily quoting NT; Byzantine chronicle quoting Patristic.)

4. The hypothesis

Under sustained written language contact, donor and recipient grammars do not displace one another. They coexist as distinct active systems inside the same writer.

Lavidas, N. 2021. The Diachrony of Written Language Contact: A Contrastive Approach. Brill’s Studies in Historical Linguistics. Leiden, Boston: Brill.

5. Why “written” matters

Written transmission has its own physics:

  • prestige standards (Atticism, Byzantine Hochsprache);
  • copying chains with attested manuscript stemmata;
  • citation as authority (citing scripture as scripture);
  • resistance to vernacular pressure at literate registers.

Each of these leaves a different signature in the syntactic record.

6. One schema across 3000 years

PROIEL XML 2.0 is the file format that stores each sentence as a tree of word-by-word grammatical relations. Developed at Oslo for the early Indo-European languages; extended in AthDGC to all of Greek.

The same 26-relation inventory annotates:

  • Homer (8th c. BCE)
  • the New Testament (1st c. CE)
  • Sphrantzes’ Chronicle (15th c. CE)
  • the Modern Greek standard (21st c. CE)

One schema; eight periods; queryable comparison.

The diachronic extension was first proposed by Lavidas and Haug (2012, Thessaloniki-Oslo PROIEL pilot on Sphrantzes’ Chronicle). AthDGC v0.4 ships that 2012 idea, generalised to the full diachronic span.

7. Text Id

Every partition carries a stable identity:

athdgc.<author>.<work>.<src_lang>.<tgt_lang>.<translator>.v<revision>

Example:

athdgc.017.001.grc.000.000.v1 (Sphrantzes, Chronicle, Greek)

Every Text Id resolves to a row in a metadata register, to a source edition, and to a citable printed reference.

8. What is built (tools)

  • An annotation pipeline that reads a Greek sentence and emits PROIEL XML 2.0 (the file format that stores each sentence as a tree of word-by-word grammatical relations, developed at Oslo).
  • A morphological tagger (Stanford Stanza, in its PROIEL-trained version), which produces lemma + part-of-speech + features automatically.
  • An AI second-opinion review loop: uncertain annotations are flagged and re-checked by a large language model, then approved by a human.
  • A live dashboard that shows token counts and per-period progress in real time as the annotation runs.

All open source. MIT · Apache 2.0 · BSD.

9. What is built (corpus)

Eight diachronic periods, all on one PROIEL XML 2.0 spine:

Period Status
Archaic, Classical, Hellenistic, Koine partitions in flight
Late Antique, Byzantine partitions in flight
Late Byzantine, Early Modern, Modern partitions in flight

New Testament verse-level alignment (cross-alignment: machine-readable matching of corresponding sentences and words between texts in different languages):

  • to Latin (Vulgate), Gothic (Wulfila), Old Church Slavonic (Marianus): aligned in v0.4;
  • to Classical Armenian: workflow established, alignment in ingestion for v0.5.

10. What is built (platform)

  • Public showcase: athdgc.github.io
  • Corpus repository: github.com/AthDGC/athdgc-corpus (closed during v0.5 audit)
  • Live PROIEL viewer: dialing.enl.uoa.gr/proiel/
  • Launch report: arXiv cs.CL (published 11 Jun 2026)
  • Permanent identifier: Zenodo concept DOI 10.5281/zenodo.20439182

CC BY 4.0 for the corpus. Citable per-version DOIs on each release.

11. Concrete numbers (today)

10.91 M tokens PROIEL-annotated 173 annotation batches complete 8 Greek periods covered 3 Indo-European parallels currently aligned at the New Testament

Dashboard snapshot: 19 June 2026 21:17 Athens. Latin, Gothic, Old Church Slavonic aligned in v0.4; Classical Armenian in ingestion for v0.5. v0.5 target 16 M tokens; v0.6 target 24 M.

12. Three threads through the corpus

Thread A. The Homeric chain. Iliad 1.1 in Archaic Greek → its Byzantine epitome (Tzetzes) → its 1955 modern Greek prose retelling (Kakridis-Kazantzakis).

Thread B. The New-Testament chain. John 1.1 in Koine → its Vulgate translation → its Gothic and Old Church Slavonic sisters.

Thread C. The Septuagint chain. Psalm 1.1 in Hebrew (target) ← LXX Greek → Vulgate → Modern Greek liturgy.

12a. A closer look at the Septuagint

The Septuagint (LXX) is the most consequential single text in the Greek chain:

  • a Koine-Greek translation of the Hebrew Bible, made in stages from the 3rd c. BCE (Pentateuch) onward;
  • the Greek Bible of the early Church: New Testament writers quote it, not the Hebrew;
  • the source of the first Christian translations into Latin (Old Latin, then Vulgate for many books), Gothic (Wulfila), Old Church Slavonic (Cyril and Methodius), and Classical Armenian;
  • still in use in the Modern Greek liturgy today.

12b. What LXX shows on a PROIEL spine

Two well-studied phenomena that AthDGC makes queryable across periods:

  1. Paratactic καί as a calque of the Hebrew waw-consecutive: clause-initial καί + finite verb, instead of classical Greek subordinate-clause patterns;
  2. Closest-conjunct agreement: number and gender agreement track the nearest conjunct under Hebrew influence (Lavidas 2019, Questions and Answers in Linguistics 5(2): 37-90).

Both are syntactic transfers from Hebrew. Both propagate outward, through Vulgate, Gothic, Old Church Slavonic.

12c. The Septuagint, a concrete example

Genesis 1:3-4 (the fiat lux sequence)

Heb. וַיֹּ֥אמֶר אֱלֹהִ֖ים יְהִ֣י א֑וֹר וַֽיְהִי־א֖וֹר׃ וַיַּ֧רְא אֱלֹהִ֛ים אֶת־הָא֖וֹר כִּי־טֽוֹב

Tr. wa-yyōʾmer ʾĕlōhîm yəhî ʾôr wa-yhî ʾôr. wa-yyarʾ ʾĕlōhîm ʾeṯ ha-ʾôr kî ṭôḇ.

LXX καὶ εἶπεν ὁ θεός Γενηθήτω φῶς. καὶ ἐγένετο φῶς. καὶ εἶδεν ὁ θεὸς τὸ φῶς ὅτι καλόν.

Gloss and said the god let-become light. and became light. and saw the god the light that good.

Three Hebrew wayyiqtol forms (consecutive narrative tense) are calqued by three clause-initial καί + finite verb in the LXX. Classical Greek prose would subordinate (participle, μέν / δέ, ὅτε clause).

12d. Three thousand years, one schema

The eight periods of Greek on the AthDGC PROIEL spine:

Archaic 8th-6th BCE Homer Classical 5th-4th BCE Koine 3rd BCE-4th CE LXX + NT Late Ant. 4th-7th Byzantine 7th-12th Late Byz. 13th-15th Sphrantzes Early Mod. 16th-18th Modern 19th-21st 3000 years of Greek on one PROIEL XML 2.0 schema the hinge: a translation that becomes a source

13. What this offers the intertextuality programme

  1. A test bed: 3000 years on one schema, queryable across periods.
  2. Structural detection on top of surface detection: not “the same words were re-used”, but “the same propositional content was re-encoded into a different syntax”.
  3. An empirical handle on the parallel-coexistent-grammars hypothesis: every retelling and retranslation is a controlled minimal pair.
  4. Reusable infrastructure: the same pipeline can run on Hebrew, Sanskrit, Tibetan, with the relation inventory adjusted upstream.

14. Resources

Contact: nikolaos.lavidas@gmail.com · nlavidas@enl.uoa.gr

Διαχρονία Γλώσσας :Χρόνος · Athens Digital Glossa Chronos

Funding

Funded by the Hellenic Foundation for Research and Innovation (HFRI) under the 3rd Call for HFRI Research Projects to support Post-Doctoral Researchers, Project No. 20577; with complementary support from the Greece 2.0 National Recovery and Resilience Plan.

Compute supplied by GRNET ARIS (the Greek national high-performance computing cluster), allocation pa260305.

Corpus licence CC BY 4.0. Tooling licences MIT / Apache 2.0 / BSD.