Corpus Linguistics
Tutorials
Corpora
Software
Concordancers
Online Concordancers
Annotation
Misc
CL in Applied Linguistics

You are now in section > Software > Annotation > Tagger

 

Tagger

LT POS

Author/Org: LTG - Language Technology Group, Edinburgh, UK
Purpose: The LT POS part of speech tagger can handle plain ASCII text and SGML marked-up text.
Access: free for research purposes; also try the interactive online demo-version

MT/E Münster Tagset English

Author/Org: Arbeitsbereich Linguistik, University of Muenster, Germany
Purpose: "In developing the MT/E, we aimed at an isomorphism between the German and the English tagset. Wherever possible, we provided the basis for lexicological research, translation projects and other computerlinguistic applications with parallel corpora or multilingual information retrieval."
Access: Check out the Tagset here (*.ps-file)

 

MT/D Münster Tagset German

Author/Org: Arbeitsbereich Linguistik, University of Muenster, Germany
Purpose: "The Muenster Tagsets for German (MT/G or MT/D(eutsch)) were developed in the context of the Muenster Tagging Project at the "Arbeitsbereich Linguistik" of the University of Muenster. We developed a big and a small tagset. The big tagset comprises 138 tags; the small tagset 53 tags."
Access: Check out the Tagset here (*.ps-file)

 

QTAG (offline)

Author/Org: Corpus Research Group, Birmingham, UK 
Purpose: "You can make use of several resource files to tag texts in different languages, and you can also train the tagger if you have a pre-tagged text in a different language for which no resource file is available yet."
Access: free download
Notes: Tagging by email is also available, check it out here

TnT

Author/Org: Thorsten Brants
Purpose: "TnT, the short form of Trigrams'n'Tags, is a very efficient statistical part-of-speech tagger that is trainable on different languages and virtually any tagset. The component for parameter generation trains on tagged corpora. The system incorporates several methods of smoothing and of handling unknown words."
Access: free for research purposes

 

TOSCA-ICLE

Author/Org: TOSCA
Purpose: "It was made for the tagging of subcorpora of the International Corpus of Learner English (ICLE). Its tagset consists of 17 major wordclasses"
Access: free for download

TOSCA-LOB

Author/Org: TOSCA
Purpose: "The TOSCA-LOB tagset is an adapted version of the tagset designed for tagging the London/Oslo/Bergen (LOB) corpus."
Access: free

 

TreeTagger

Author/Org: TC project at the Institute for Computational Linguistics of the University of Stuttgart
Purpose: The TreeTagger is a tool for annotating text with part-of-speech and lemma information which has been developed within the TC project at the Institute for Computational Linguistics of the University of Stuttgart.
Access: free download; runs only on Sun workstations and Linux PCs

Xlex/www tools

Author/Org: Arbeitsbereich Linguistik, University of Muenster, Germany
Purpose: Xlex/www is a suite of tools for linguistic data processing, with an web-based, graphical front-end.
Access: free for educational purposes; VIEW or USE the online demo version or ORDER your own copy

 

You are now in section > Software > Annotation > Tagger

Data-driven learning
Virtual Resources
Bibliography
Email
About

webmaster@corpus-linguistics.de