1. Criteria/data: Uncountable or Countable?
Do MICASE speakers prefer to say “the data is” or “the data are”? Or is it more complicated than this? This small study explores such questions.
Author: John Swales
Date: November 15, 2002
Download this paper as a PDF file now: Criteria/Data (PDF)
Kibbitzer 1
One of Tim Johns’ most valuable “talking points” in his Kibbitzer series is his discussion of whether the word data, and subsequently the word criteria are singular or plural. Here below are some updates on Kibbitzer 6, drawing upon two small corpora. One is Ken Hyland’s corpus of 80 recent research articles, ten each drawn from eight fields ranging from engineering to philosophy; the other is the Research Sub-corpus of the Michigan Corpus of Academic Spoken English (this link will change once the new website is ready). This latter consists of 36 speech-events, covering such genres as colloquia, research groups meetings, advanced graduate classes, and dissertation defenses, drawn from all of the university’s four main divisions.
Consider the case of criterion/criteria first. In Hyland, there were 18 instances of criterion, and 37 instances of criteria (almost exactly double). Of these 37, 25 were plural, 7 were indeterminate as to number, but five were singular. In Kibbitzer 6, Johns noted the occurrence of this singular usage in quality newspapers; it now seems that it has begun to spread to research writing. Here are four examples:
1. Surveys of clothes buying behaviour show that quality is the most important criteria followed by fashion and price.
2. It has been argued that the convenience orientation represents a segmentation criteria which transcends national and cultural boundaries.
3. …the global firm may be able to exploit an evaluative criteria important to individual consumers…
4. …mutualistic functioning of these associations should be a defining criteria of the term mycorrhiza.
In the MICASE data, criterion does not occur as a head noun but only in compounds such as “criterion-related validity”. And rather surprisingly, only one of the 17 instances of criteria can be identified as singular:
5. i would have to have a very stringent criteria for for a moment in this study
The MICASE research sub-corpus findings for data show a slight preference for the singular over the plural (23 instances over 18, with around 150 hard or impossible to categorize). Here are some examples (with minor editing), the plural ones first:
6. …the sequence data are available here.
7. …the first step would be to plot these data…
8. …this might help give you confidence if all the data seem to fit the curve.
9. …do your data really, allow you to say that?
10. …huh may need to back up this data.
11. …i’ll show you that data in a minute.
12. …you need less, less data. uh, you can use averages…
13. …there is very little data collected for these databases…
14. …and we have data that shows that those very shaded plantations…
Finally, there are two very interesting cases where we get in effect a mixed message as to whether data is singular or plural. Consider:
15. …there’s very few data that’s collected so, the first question is…
16. …because this was identifying data, at least mine were…
In the first (15), the verbs are singular but the choice of very few suggests that the speaker believes that data is in fact plural. The second case (16) is even more interesting in that the speaker implies that the general data is singular/uncountable, but his or her own data are plural, perhaps somehow suggesting that the latter is more interesting or more relevant!
The findings from the Hyland corpus for data tend to confirm Johns’ findings from Nature. Of those that could be identified as to number, 62 (80%) were plural and 15 (20%) were singular, many of the singular ones occurring-as Johns found-in contexts involving computer science. There remains a single occurrence of that “etymological relic”, datum:
17. …a datum vis-à-vis a principle applied to it from without,…
Further thoughts on these two words to: jmswales@umich.edu
Do MICASE speakers prefer to say “the data is” or “the data are”? Or is it more complicated than this? This small study explores such questions.
The traditional rule recommends “between” for two things and “among” for more than two. Do MICASE speakers follow this rule? What are some of the variables involved?
In ordinary speech, hyperbole (or exaggeration) is common, as in “I’ve got a million emails to sort out”. What happens in academic speech? Do we still find these exaggerations (which we would not find in academic writing)? Or are MICASE speakers more careful? Find out in this Kibbitzer.
Do MICASE speakers use “less” with uncountables (e.g. less money) and “fewer” with countables (e.g. fewer dollars)? Or are there other factors at play?
Do people use the verb “suggest” to make suggestions, or other kinds of language? This kibbitzer provides some surprising answers to this question.
When do we say will and when do we shorten it to the contracted form ’ll? This careful quantitative study attempts to answer this question. (Amazingly, the first author was a first-term undergraduate when she did this project.)
The expression “no way” is often used among friends to express strong denial (“can you lend me fifty dollars? No way!”). In academic speech, is it used for some other purposes?
When we speak, we sometimes recognize that we have misspoken in some way, and so we try again. At times, we ‘announce’ that we are going to rephrase. What are the common ways of doing this in MICASE? And are these rephrasings typically longer or shorter than the originals? This kibbitzer attempts to answer these and similar questions.
These very similar pronouns play an important role in instructor-student interaction. When are they used in the full question form (“does anyone wanna guess?”), or when in a shortened structure (“anyone wanna guess?").
The use of “so” in such phrases as “I guess so” is not that common in MICASE. With which verbs does it occur? In which speech-events? Do other languages use a similar structure?
In written English, the standard structure of a sentence is subject-verb-complement. However, in speech, variations are possible, and we actually move the subject into different positions. Example: “the test will be easy” (standard); “the test, it will be easy” (pre-dislocation); How common are the non-standard forms? Where do they occur, and why?
Vocatives, such as “okay, John, let’s move on”, are known to be hard to explain to English learners. This study investigates the following kinds of question. What types of vocatives are there in MICASE? Which kinds of speech-event tends to attract vocatives? Do vocatives have different functions when they occur at the beginning, in the middle, or at the end of utterances?
This phrasal verb is one of the five most common in MICASE. Why is this? What are its functions? Does it have some special uses in academic speech? And what about the uses of “end up with”?
This Kibbitzer investigates the phrases that English speakers use to check for audience or listener comprehension, like “do you see what I’m saying?” or “does that make sense?”. Which phrases are most common?
This Kibbitzer examines the phrases that speakers use to end a list of examples that they do not wish to say in full (ex: “and so on and so forth”, “etcetera etcetera”, “and things like that”). What are the most common phrases in MICASE, and who uses them more frequently, instructors or students?
This Kibbitzer focuses on clarifactory phrases beginning “just so”. Why are these phrases useful for presenters and instructors?
Does the frequency of we vary across different academic disciplines and registers? In this kibbitzer, we examine MICASE samples from the physical sciences and compare our results with earlier studies.