Doing MICASE-based Investigations (I): Starting with a word
These two linked entries are designed to help newcomers to corpus linguistics and to the MICASE corpus, get started on analyses. This tutorial deals with the easier scenario of starting with a word. The word chosen for illustrative purposes is “concern”.
There are several possible starting points for analysing the academic speech contained in the MICASE corpus. One of the more obvious ones is starting with a word—or more exactly a lemma (a word in all its forms). Other starting points would be some grammatical structure (if-clauses; utterances with missing subjects) or some discoursal function (making suggestions; introducing a speaker). These will be dealt with in Parts II and III respectively.
So what follows is a twisting narrative of what I did, rather than a polished methodological account—and it should be read as such.
The lemma I have decided to use is CONCERN (there is a convention that lemmas are capitalized). I had a number of reasons for this choice. One was that it might be pragmatically interesting; in other words, I suspected that it might be a rather subtle way of expressing criticism (“my main concern with your paper is that it’s a little bit short”). A second reason had more to do with grammar. For example, would it be more common as a noun or a verb, and which prepositions would be associated with each? A third thought I had was that CONCERN might be one of those lemmas that made academic speech a little more “academic-like”. In practical terms, it might be a good lemma for (international) students to use if they wanted to make a good impression, as with this kind of observation: “although the central concern of this paper is to assess global warming, its conclusions are rather speculative”.
So first I did a quick search on CONCERN. (I used Wordsmith Tools, but the on-line interface will work nearly as well.) I got 265 hits (about 150 per million words). This was good news. 26 would have been too few to reach any conclusions, while 2650 would have involved so much work as to bring into question whether the investigation would have been worth it!
Second, I had a quick look at dictionaries and grammars, both to orient me as to what the options might be—as well as to see whether everything about CONCERN had already been covered. I was reminded of a few small things, but there still seemed room for some (small) new contributions.
Third, I sorted by the ‘centre’ function to separate out the various morphological forms (concern, concerned, concerning, concerns).
Fourth, I began to make decisions about what I didn’t think would be so interesting:
As you doubtless know, CONCERN can be both a noun and a verb. (However, in either case it is pronounced the same, unlike CONDUCT, which is conduct as a noun, but conduct as a verb.) So which is more common? To discover this, I used Wordsmith’s ‘set’ sort feature (which allows you to label each token with a single letter code and subsequently sort by that code), marking each occurrence of the lemma as ‘n’ or ‘v’. (Alternatively, you can print out the 250 examples and hand-sort.)
The first breakdown of the 250 tokens resulted in 114 nouns and 136 verbs (including participles).
A next obvious question was whether the 136 verbs were active or passive. In fact, only 15 were active, and the rest passive. This skewed distribution suggests that concern is here functioning rather more like it does in academic prose than in conversation; see the Longman Grammar, p. 474. (And this finding would seem to reinforce my third “thought” under A above.)
At this stage, there were also some things that I felt I didn’t need to look at. One was the tense of the verbs; another was whether the nouns were singular or plural; a third was whether the sentences contained a negative; and a fourth was whether the utterance was a statement or a question. Possibly good lines of inquiry, but not yet!
Recall that in section A, I had said prepositional usages following CONCERN might be of interest. So this is where I went next.
Concern as a NOUN: most frequest prepositions
| preposition | |
| about | 22 |
| with | 9 |
| for | 5 |
| over | 1 |
| TOTAL | (37/114 total cases in MICASE) |
Concern as a VERB: most frequest prepositions
| about | 41 |
| with | 39 |
| of | 1 |
| in | 1 |
| TOTAL | (82/136 total cases in MICASE) |
Okay, we now have some interesting numbers. From the table above we can see that about and with are by far the most popular choices for the following preposition, while for and over are rare alternates.
At this stage, we had better line up some examples (edited a little sometimes) so we can get a better sense of what is happening. This then is a first move away from quantitative to more qualitative research.I have chosen 4 examples of nouns and 4 of verbs:
Examples 7 & 8 suggest that in and of are likely to be marginal—and probably would not accepted by a majority of native speakers of English. So, we will focus later on the about/with dilemma. But that will be for the truly qualitative stage; for now we need to focus on a few more numbers.
The patterns are beginning to look a little clearer with the verbs than the nouns, so let’s press ahead down this track. We have already established that 82 out of the 136 are followed by a preposition. What of the others?
A Phraseological Formula:
There were 22 examples of:
…as far as [NP]+ [BE]+ concerned,….
As you might expect, the only repeated element is the first person pronoun:
9) as far as i’m concerned you could use have it on a lower case or an upper case.
There are 5 instances of this first person usage. The others are widely scattered; one further example:
10) as far as poverty is concerned, many things are missing.
Although not particularly common, this is a useful pattern for international students (and others) to use. It is easy to learn, it offers speakers “think time" as they work out what they want to say, and it is a relatively easy way of gaining entry into a discussion.
Concern as an Active Verb:
You may remember that there were just 15 of these, and they have presented me with a bit of a problem. This is because most are primarily referential and basically mean “about”, as in this next example:
11) the announcement that I made last, day. which concerns a change in the lecture schedule.
Here which concerns is basically equivalent to concerning, and so I am going to ignore these. In fact, we are left with but 4 examples of active concern that actually express concern! Here are two:
12) …and that’s something that um that concerns me and that’s something that i’m working on…
13) …and i think that those need to really be issues that we concern ourselves with…
So let’s sum up where we are. We started with 136 verbs:
What Is Left?:
As you might have guessed, many of the remaining 17 examples involve a following ‘that’ clause, there being 11 instances of these, such as:
14) i’m concerned that you’ll run into some problem that you don’t have now if you make the focal length too short.
The remaining six are a miscellany, including a number of truncated usages such as “are you concerned?”
(So this is where we got to at the end of Day One. It took about six hours work, including the writing-up as I went along. So on Day Two, we need to have another look at the nominal uses of “concern”, although I suspect the results will be messier than for the verbal uses. Then there is the “about/with” question. Finally, and more importantly, we need to say something about how this lemma functions—how and where it is used in MICASE speech.)
It turns out that the great majority of the uses of the noun occurred as complements. The verbs they followed were the following:
| be | 47 |
| have | 14 |
| address | 3 |
| guess | 2 |
| share | 2 |
| voice | 2 |
| (single verbs) | (21) |
Basically, many of these structures are simple, using for the most part the very common verbs be and have:
15) yes um there’s a lot of concerns with how you do an experiment like that…
16) um i understand there’s a concern that sentencing, if you reform sentencing it won’t be swift and severe.
17) and this is a kind of lame concern,
18) they had no concern about things like sturgeon, that were declining.
19) if you have any concerns, any questions, email me.
20) she’s basically gonna be there to, address any sorts of concerns
Connected with this is another finding that I found surprising. The noun “concern” was not consistently or regularly modified. Here are the few modifiers that occurred more than once:
| health | concern(s) | 4 |
| real | " | 3 |
| big | " | 2 |
| great | " | 2 |
| tremendous | " | 2 |
As you can see, very few speakers attempted to add an adjective before concern. The message here seems to be “keep it simple!”
For this, I went back to the sort mechanisms and the “words in context” box in Wordsmith. The resulting re-review of the MICASE evidence shows clearly that the “concern/concerned about” structure is almost always really about some expressed worry, anxiety or real concern. The data for “concern/concerned with” is much more ambiguous. Indeed, only a little under half seem to express something seriously problematic, as in:
21) and there in particular there’s one culprit we’re very concerned with mercury.
The others, however, more often express simply interest or involvement:
22) Einstein was concerned uh, with the behavior of light. he was interested in the various features of light.
Example 22) is clearly different from 22a):
22a) The mother was concerned uh, with the behavior of her teenage daughter.
Einstein wasn’t of course concerned that light would behave badly (!), only with understanding its behavior.
Here are a couple of further examples:
23) this is something that Courbet grappled with, and Manet like Courbet is very concerned with real life.
In this case, the two nineteenth century French painters are determined to try and represent ordinary life as realistically as possible in their paintings, and are not actually worrying about real life as such!
24) oh, for this, you, don’t be concerned with N-H-three complexes and post-tradition necessarily, um, especially about making inferences, um as to whether they are complex or not, uh, because we haven’t really tested that in lab.
In this longer extract, the situation is perhaps a little more ambiguous, but the more likely interpretation is that the instructor is telling the students not to spend time on N-H-three complexes etc, rather than telling them not to be anxious about them.
The upshot then is that students might be advised to use “CONCERN about” for occasions when they want to express real concern (or worry) and use “CONCERN with” to express simple involvement or interest. This last finding leads into the next section.
For this final stage of the investigation, I turned to the transcripts available on the website. First, look at this passage from an instructor, giving a lecture on the Biology of Birds (LES 175):
25)…it was in the ni- about nineteen twenty, that, a real landmark piece of legislation was passed and it’s uh referred to the Migratory Bird, Treaty Act…..so it’s a very, extensive agreement, um regarding, protection of migratory species, and native species, and this came into play because of concern about overharvests, particularly the waterfowl. Many of the ducks breed in Canada and, may winter, in the gulf or down, uh considerably further south, so they were passing through different, bureaucratic zones, and in some areas they were protected and some they were not, and there was tremendous concern about the harvest, heavy harvest occurring in Canada before waterfowl even got to the U-S.
The instructor here then is using the expressions I have bolded to drive home to the listeners the seriousness of the problem. This kind of use is quite often adopted by experts in areas such the environment (as in this case), the health sciences, and in many of the social sciences. It is a device for bringing issues to people’s attention—issues that the student listeners may not have taken seriously hitherto.
Now let’s consider two further extracts. Both are from the Women in Science Panel (COL 999). This panel was concerned with (!) the difficulties that women face in science or engineering careers.
26) …what we need to do is talk about changing that culture and i have lots of examples of s-stories of hazing of young of young women I mean not [SU-F: mhm] literal hazing, but [SU-F: mhm] uh making life so uncomfortable….and i think that those need to be issues that we concern ourselves with, when we ask you know what do we need to do to make a more, open door for for girls. To create a culture that, women and girls wanna walk into.
27) the other_ the last thing i’m concerned about is it appears to me that the Women’s Movement has become a bit rigid. And i find myself working in a lot of vari- various women’s groups and how we can’t discuss issues as freely as we might have done twenty years ago….
Here the there was a tremendous concern of the speaker in 25) has in 26) become much more a call for action (need to be issues that we concern ourselves with), while in 27) the speaker is making more of a critical but personal observation about her personal landscape (the last thing i’m concerned about).
Overall, then we have seen in these three examples a move in pragmatic function from the general to the personal. Finally, we need to look at how CONCERN operates on a more directly interactive plane.
For this, I have chosen an extract (minor editing) from a discussion among senior undergraduates (SGR999). The group are discussing their advisors and whether they have enough credits to graduate. S2 has been explaining that she is not sure whether her advisor actually knows that he is her advisor (!):
S1. Yeah maybe <you should email him on a weekly basis, reminding him
S2. sad but true. i hate him yeah> I told you he’s making me take another class.
S3. i was concerned about that.
S1. Are did you talk to Warrenyou were concerned about that
S3. <i know, i thought he might have screwed it up
S2. I mean so therefore the fault, therefore the fault lies on me. And like definitely fifty-fifty you know because I should have been aware of what, credits….
The exchange starts with criticisms of the advisor, after which S3 expresses concern about S2’s difficulties. S2 nicely acknowledges S3’s concerns (I know you were concerned about that), but then goes on to admit that the fault is by no means on one side. CONCERN here operates to build group solidarity and affiliation.
My final example takes me in some way back to where I started—to the impression that the lemma CONCERN might well be involved in muted criticism or adversarial advice. As so often, our intuitions turned out to be basically wrong! Or at least mine are! This is because examples of this pragmatic function are very hard to find. Here is one of the few—from an advising session. The advisor (S2) is concerned that the junior student may be opting for a very tough science schedule (the student wants to enter the School of Pharmacy):
Um so i guess my concern would just be if you were to take something like biology physics mathematics and computer science you’ve got…[The undergrad then says it’s a difficult schedule and the advisor concurs]
We can immediately notice that the speaker here has highly hedged her contrary advice; she says “I guess” and “just” and uses “would be”. Further, in the second half of her utterance she puts the student’s preferred option into a highly hypothetical realm—“if you were to take…).
This then has been my second day of work on this topic (making some 12 hours in total). The results are fairly interesting, but not dramatically so, ending with a piece of useful negative evidence. I primarily offer this account hoping it may help beginning corpus linguists with their own investigations. If it seems to prove useful, Parts II and III will follow.
Update: September 8, 2004
Since then I have revised my earlier draft three times in repsonse to comments. First on September 15 I put in another hour partly in response to some comments received from David Lee.
After I posted the second draft on micasers@umich.edu, I received a thoughtful email from Dr Inmaculada Fortanet of Castellō in Spain. Several of her comments have since been addressed. She also suggested that I should relate this type of analysis to the description of genres, and to teaching. Perhaps these should become Parts IV and V! She also noted that the difference between quantitative and qualitative research was not made very clear. Well, actually I think these are two aspects of a single investigation, which is why I have provided quite a lot of numbers but also quite a lot of examples.
So I thought we were done. Fourteen-fifteen hours in total.However, Aaron Ohlrogge, a new member of the MICASE committee, had some suggestions I needed to incorporate. Add another hour.
Update: October 15, 2004
Reactions and further comments please to jmswales@umich.edu.
Learn how to use the MICASE search interface from the experts. They walk you through the features of MICASE online, how to search, and how to save and interpret your results. Using corpora has never been so easy!
Power Point slides from the video demo of MICASE online. Read through the slides at your own pace, or save them on your computer for your own reference later.
Helpful hints for using MICASE Online.
This is developed for teachers of English, but it is a nice short introduction to searching MICASE for anyone who wants to learn the basic features of our online search interface.
These two linked entries are designed to help newcomers to corpus linguistics and to the MICASE corpus, get started on analyses. This tutorial deals with the easier scenario of starting with a word. The word chosen for illustrative purposes is “concern”.
This section walks you through the process of using MICASE to search for a concept. This is much harder than simply searching for a word, because it is necessary to imagine possible ways of making suggestions prior to starting your search. It uses the example of investigating how speakers make suggestions to help you get started.