Corpus-based Valency Lexicon for Contrastive and Diachronic
Study
Languages from Antiquity to Today
All data and tools freely available
From Homer to today
Penn-Helsinki & PROIEL standards
Language evolution patterns
CVL-CDSAML is an open access research project developing a comprehensive corpus-based valency lexicon for the contrastive and diachronic study of languages from antiquity to today. Funded by HFRI/ELIDEK, we employ Penn-Helsinki parsing standards and PROIEL treebank architecture to track valency patterns across 3,000 years of linguistic evolution.
Building annotated corpora from Homer to contemporary texts. Open access resources including diachronic retranslations.
Utilizing Penn-Helsinki standards, PROIEL treebank architecture, and state-of-the-art NLP techniques. All tools and data are open access.
Systematic investigation of argument structure changes across language families, tracking evolutionary patterns over millennia.
Integrated corpus resources, computational tools, and AI-enhanced platforms for diachronic linguistic research
Corpus-based Valency Lexicon
Main Project: Valency patterns across 3,000 years from Homer to today
Penn-Helsinki & PROIEL standards • Open access lexicon • HFRI/ELIDEK funded
ΕΛΙΔΕΚ Project
10M+ tokens from Linear B to modern Greek
AI-powered analysis • Semantic change detection • Interactive visualization
Visit Site →Automated Workflow System
AI-Enhanced Tools: Smart analysis, auto-parse, LightSide ML
Automated workflow • Claude AI integration • GitHub repository links
Visit Site →Computational Infrastructure
Integrated Suite: Lavidas Parser, PROIEL processor, text analysis tools
Treebank integration • CoNLL-U format • Python & R packages
Comparative Analysis
Multi-language: Comparative diachronic analysis across IE languages
Ancient Greek • Latin • Sanskrit • Historical English • Germanic languages
Specialized Collection
Focused Dataset: Verb valency patterns with detailed annotations
Version 2.0 • Enhanced annotations • Machine learning ready
Unified access with single sign-on • Protected research data • Collaborative team workspace
National and Kapodistrian University of Athens
Specializing in historical syntax, language change, and corpus methodology.
Co-Investigator
National and Kapodistrian University of Athens
Professor of Linguistics specializing in construction grammar, lexicography, and language change.
External Collaborator
University of Oslo
Creator of PROIEL treebank, specialist in computational historical linguistics.
Post-Doctoral Researcher
National and Kapodistrian University of Athens
Specializing in corpus linguistics and computational approaches to historical analysis.
Post-Doctoral Researcher
National and Kapodistrian University of Athens
Focus on historical morpho-syntax and language variation in diachronic corpora.
Post-Doctoral Researcher
National and Kapodistrian University of Athens
Expert in digital humanities and computational text analysis.
PhD Researcher / Research Team Member
National and Kapodistrian University of Athens
Working on corpus annotation and linguistic data processing.
PhD Researcher / Research Team Member
National and Kapodistrian University of Athens
Focus on valency patterns and argument structure in historical texts.
PhD Researcher / Research Team Member
National and Kapodistrian University of Athens
Specializing in comparative historical linguistics and genealogical analysis.
Department of Literature, Area Studies and European Languages
Partner institution for diachronic computational linguistics
Pragmatic Resources in Old Indo-European Languages
Open Access Infrastructure
The project employs established computational linguistics infrastructure including Penn-Helsinki parsing standards, PROIEL treebank architecture, and open access tools for diachronic analysis. All resources, including diachronic retranslations and annotated corpora, will be freely available to the research community.
Phase I
Text collection and initial annotation of historical corpora from Homer to today.
Phase II
Development of open access parsing tools and valency extraction algorithms.
Phase III
Systematic analysis of valency patterns and construction of the interactive lexicon.
Phase IV
Public release of all data, tools, and educational materials.
Annual summer school on historical linguistics and corpus methods.
Research laboratory for language contact and diachronic retranslations.
Intensive program on computational approaches to language change.
Graduate program with specialization in historical and computational linguistics.
Educational videos about historical linguistics, language evolution, and computational methods
Interactive database of valency patterns from Homer to today
Search annotated historical texts with Penn-Helsinki standards
Video guides and documentation for all tools
Open access data, tools, and educational materials
Publications from the CVL-CDSAML project will be listed here as they become available. All publications will be open access.
Prof. Nikolaos Lavidas
Division of Language-Linguistics
Department of English Language and Literature
School of Philosophy
National and Kapodistrian University of Athens
Email: nlavidas@enl.uoa.gr