Corpus Linguistics
Tutorials
Corpora
English Corpora
German Corpora
More Languages
Spoken Corpora
Learner Corpora
ICE
Corpora
Parallel Corpora
Historical Corpora
Treebanks
Text
Archives
Alphabetical List
Software
CL in Applied Linguistics

You are now in section > Corpora> Spoken Corpora

Christine
COLT
Spoken Bulgarian
CPSA
LLC
Longman British Spoken
Longman American Spoken
MICASE
SEC


CHRISTINE Corpus

Org:  Geoffrey Sampson, University of Essex, UK
Time: first distributed in August 2000

Size:

Contents: spoken English, and particularly spontaneous, informal spoken English

Access:

freely available for download here
Notes: see also SUSANNE

COLT - Bergen Corpus of London Teenage Language

Org:  University of Bergen, Norway
Time: material collected in 1993

Size:

500.000 words; Pilot-version consists of 151 texts
Contents: transcripts of spoken 'London Teenage Language'

Access:

search in the pilot version is available; registered users can search the entire corpus online; COLT is also distributed on the ICAME CD-ROM.
Notes: COLT is part of the BNC; it is tagged for word classes

Corpora of spoken Bulgarian

Org:  Department for East European and Oriental Studies, University of Oslo
Time: unknown

Size:

unknown
Contents: Krasimira Aleksova's corpus of spoken Bulgarian; Cvetanka Nikolova's corpus of spoken Bulgarian; Parliamentary debates, transcribed by Ivanka Mavrodieva

Access:

free download
Notes:

CPSA  - Corpus of Spoken Professional American English

Org: Contact: Michael Barlow
Time: 1994-1998

Size:

2 main sub-corpora, 1 mio words each
Contents: short interchanges by 400 speakers – professional activities broadly tied to academics and politics

Access:

Registered users only ($79 for the individual using the tagged version)
Notes: The tagging was performed by Tony McEnery and Paul Baker using the CLAWS programme at UCREL, Lancaster University; available both tagged and untagged

LLC London-Lund Corpus of Spoken English

Org:   
Time: 1960s-mid-1970s

Size:

500,000 words
Contents: spoken British English

Access:

 
Notes: The LLC is the result of two projects: SEU (1959) at University College London and SSE at Lund University in 1975.

Longman British Spoken Corpus

Org:  Longman
Time: recent

Size:

approx. 10 mio words
Contents: "The Spoken Corpus consists of natural, spontaneous conversations heard all around us and from the language of lectures, business meetings, after dinner speeches and chat shows."

Access:

 
Notes: Click here to listen to a sample of the British Spoken Corpus

Longman Spoken American Corpus

Org:  Longman
Time: ongoing

Size:

approx. 5 mio words
Contents: "It represents the everyday conversations of more than 1000 Americans of various age groups, levels of education, and ethnicity, and includes speakers from over 30 US States"

Access:

 
Notes:  

MICASE - Michigan Corpus of Academic Spoken English

Org:  English Language Institute at University of Michigan, U.S.
Time: project started in 1997, ongoing

Size:

71 transcripts (totaling 813,684 words)
Contents: academic speech from across the university

Access:

free; browse the corpus by selecting various categories; Search with Context;Speaker and Speech Event Attributes;Choose Result Settings
Notes:  

SEC Lancaster/IBM English Corpus

Org:  University of Lancaster, UK and IBM
Time: mid 1980s

Size:

53,000 words
Contents: transcripts from radio-broadcasts

Access:

available on the ICAME CD-ROM 
Notes: tagged with CLAWS2 POS

You are now in section > Corpora > Spoken Corpora

Data-driven learning
Virtual Resources
Bibliography
Email
About

webmaster@corpus-linguistics.de