About MICASE

The history, purpose, and ideas behind the corpus of academic speech.

1. What is MICASE?
2. What is ‘academic speech’?
3. The ideas behind MICASE
4. How we chose which speech events to record
5. How we collected the MICASE data
6. A brief history of MICASE
7. How MICASE benefits our community
8. The MICASE team through the years


1. What is MICASE?

MICASE is a collection of nearly 1.8 million words of spoken academic English (see next section for a definition of academic speech), all recorded on the University of Michigan campus, and transcribed into searchable documents. You can search MICASE for words and phrases to see how English is being used in academic settings.

2. What is ‘academic speech’?

Academic speech is defined as that speech which occurs in academic settings. In other words, it is not pre-defined as something like ‘scholarly discussion’. In academic settings, we might, for example, find such speech acts as jokes, confessions, and personal anecdotes, as well as the more prototypical definitions, explanations and intellectual justifications. Therefore, the real question is how we define ‘academic setting’. We have taken an open yet circumscribed stance on this.

The speech events included in the corpus include: small and large lectures (62), public interdisciplinary or departmental colloquia (13), discussion sections (9), student presentations (11), seminars (8), undergraduate lab sessions (8), lab group and other meetings (6), one-on-one tutorials (3), office hours (8), advising consultations (5), dissertation defenses (4), study groups (8), interviews (3), campus/museum tours (2), and service encounters (2).

On the other hand, we have excluded certain events that occur on campus but would not be significantly different if they had occurred in other locations. For example, we did not record food-ordering sequences in university food outlets or discussions among those who work in the university’s plant or grounds departments. These speech events we do not consider central or particular to a university community’s educational mission.

3. The ideas behind MICASE

In 1997, the English Language Institute (ELI) at the University of Michigan started the MICASE project. Dr. Rita Simpson was the original project manager, working with Professor John Swales (faculty advisor) and Dr. Sarah Briggs (testing advisor).The project was driven by two questions:

  1. What are the characteristics of contemporary academic speech —its grammar, its vocabulary, its functions and purposes, its fluencies and dysfluencies?
  2. Are these characteristics different for different academic disciplines and for different classes of speakers?

4. How we chose which speech events to record

Because MICASE aimed to record a wide range of academic speech, our sampling goals spanned fifteen different types of speech events and four major academic divisions within those types (Humanities and Arts, Social Sciences, Biological and Health Sciences, and Physical Sciences). We adopted stratified random sampling. Each recording is classified according to speech event type, a pre-assigned number indicating the academic discipline, two letters representing the majority of participants in the event (e.g. junior undergraduate, senior faculty, staff), and a final three digit sequence to track chronologically when the tape was recorded. For example, transcript number LEL115SU015 is a recording of a large lecture (LEL) in anthropology (115), at the senior undergraduate level (SU), and is the 15th speech event recorded for MICASE.

5. How we collected the MICASE data

All recordings were made with a digital audio tape recorder with two external stereo microphones, and at selected events, a video recorder. Two researchers attended most speech events in order to identify speakers and facilitate transcription by taking field notes about nonverbal contextual information; however, in small groups (e.g. advising sessions, office hours, study groups) where an observer’s presence would have been intrusive, the research assistants left the room after the equipment was set up. All speech was recorded with written consent from the major speakers and verbal consent from other participants. Demographic information (sex, age group, university position, and native language) was collected from each speaker on a form distributed at the end of each event. The speaker information is included in the header of each transcript and is also entered into a separate database. All DAT recordings were captured and stored as MP3 format sound files for use with our computer transcription program, SoundScriber, and have also been re-digitized as WAV format files and transferred to data CD for archival purposes.

6. A brief history of MICASE

In June 2001, the first phase of the project was completed, with over 190 hours of academic speech recorded. In April 2002, the transcription and proofing of all transcripts was completed (approximately 1.8 million words).

Then, in May 2002, the original search interface was launched, with a redesigned version released in June 2007. It has grown in popularity each year since its release, approaching nearly 140,000 hits in 2006. In 2009, we are excited for the release of a number of new features and support tools, including new MICASE online demos and new resources for EAP/ESL teachers!

The project is currently managed by Dr. Ute Römer (Michigan Corpus Linguistics, Unit Director), with support from Dr. Matthew Brook O’Donnell (Post-doctoral Research Fellow). However, the MICASE project has only been possible with the help of a long list of talented faculty, staff, and research assistants over the years.

7. How MICASE benefits our community

The ELI has committed resources to MICASE for a series of interlocking reasons:

  • First, there was originally no database of this kind available.
  • Second, we strongly suspected that once we examined the corpus for recurrent grammatical and phraseological patterns, we would find many divergences from those described in current grammar and vocabulary books, which have largely relied on introspection or on features of written texts.
  • Third, we eventually hope to be able to track generalized changes in speech patterns as people gain experience of university culture. (Although we know quite a lot about how academic writing evolves as students progress, our current perceptions of speech changes within academic cultures are largely anecdotal.)
  • Fourth, with all this new information, we— and others elsewhere — will be in a better position to develop more appropriate ESL and English for Academic Purpose teaching and testing materials, and to evaluate how best to incorporate corpus work into EAP programs.

We hope the MICASE project continues to provide helpful resources for researchers, EAP teachers, and English language learners.

8. The MICASE team through the years

MICASE Team in 2001

left to right: John Swales, Sarah Briggs, Janine Ovens, Rita Simpson

MICASE Team in June 2007

left to right: Yung-Hui Chien, Jesse Sielaff, Stefanie Wulff, Sheryl Leicher, Annelie Adel, John Swales

MICASE Team in November 2007

left to right: Stefanie Wulff, John Swales, Ute Römer, Nick Ellis, Jesse Sielaff, Yung-Hui Chien, Merche Querol (ELI visiting scholar)

MICASE Team in November 2008

left to right: Geoff Ho, Jesse Sielaff, Emily Lin, John Swales, Miranda Kozman, Nick Ellis, Matt O’Donnell, Ute Römer

MICASE Team in June 2009

left to right: John Swales, Ute Römer, Edwin Teng, Matt O’Donnell, Miranda Kozman, Emily Lin, Geoff Ho, Madison Stuart

Contact / About Us