SAMUELS Project Portal
The SAMUELS project (Semantic Annotation and Mark-Up for Enhancing Lexical Searches) developed a new semantic annotation tool, the Historical Thesaurus Semantic Tagger (HTST) and produced semantically-annotated versions of two major text corpora, the Semantic Hansard Corpus and the Semantic EEBO (Early English Books Online) corpus. By annotating large textual datasets such as linguistic corpora with semantic tags, powerful new ways of exploring their data are made available. Users can search Semantic Hansard and Semantic EEBO not only for a word but for a concept, and can explore the ways in which these concepts relate to one another rapidly and accurately, a process which can be slow and painstaking using previously available resources. Additionally, semantic annotation allows users to search for a desired meaning of a word with multiple senses (such as bank which may mean ‘river bank’, ‘financial institution’, or ‘piggy bank’, amongst other things), without having to laboriously eliminate irrelevant hits from their results.
The Historical Thesaurus Semantic Tagger integrates elements of the gold-standard USAS and CLAWS taggers and expands their capabilities in two main ways; it utilises an extensive fine-grained set of meaning classifications in its tagging pipeline, and it can be used on historical forms of the language as well as on present day English. These advancements are made possible through the use of data from the Historical Thesaurus of English, the only thesaurus thus far created with full coverage of a language in its modern and historical forms. The Historical Thesaurus also provides a link to the Oxford English Dictionary, whose enormous and complex database of words' variant spellings are integrated into a tagger here for the first time.
The research team included experts in natural language processing at Lancaster University’s University Centre for Computer Corpus Research on Language (UCREL), including the developers of the original UCREL Semantic Analysis System (USAS) semantic tagger and the creator of the Variant Detector (VARD) system for normalising word spelling in historical text. Semanticists and corpus linguists at the University of Glasgow ran the project, provided knowledge of meaning relationships, and worked to tailor a version of the Historical Thesaurus hierarchy to the tagger’s needs. Colleagues at the University of Huddersfield and University of Central Lancashire tested the utility of the tagger’s output on pilot projects, both of which have led to further research and funding.
The SAMUELS project was funded by the Arts and Humanities Research Council in conjunction with the Economic and Social Research Council (grant reference AH/L010062/1) from January 2014 to April 2015.
The SAMUELS consortium consisted of the University of Glasgow (lead institution), Lancaster University, the University of Huddersfield, the University of Central Lancashire, the University of Strathclyde, and Oxford University Press. Our international partners were Brigham Young University (Utah), Åbo Akademi University (Finland), and the University of Oulu (Finland).
The SAMUELS project was funded by the AHRC with the ESRC
SAMUELS Page Navigation
Related Resources
Semantically-Annotated Corpora
Project Methodology
Why Use the Historical Thesaurus of English for Semantic Annotation?
Hansard at Huddersfield
Go to Hansard at Huddersfield website to explore Semantic Hansard up to 2020
Semantic EEBO in the Oxford Text Archive
Download individual semantically annotated EEBO-TCP (Phase I) texts from the OTA