* Concordancers 2


First you need a concordance program and a corpus. The ICT4LT website, Module 2.4 gives a long discussion of all aspects of concorncing with students and mentions quite a few programs and corpora. http://www.ict4lt.org/en/en_mod2-4.htm

I personally recommend the COCA website which gives access to one of the biggest corpora in the world and allows on-line use of a good concordancer. http://corpus.byu.edu/coca/ Scott Thornbury’s blog gives a good demo of how to use it. http://scottthornbury.wordpress.com/


The learner as researcher

From a learner’s perspective concordancers are important not so much for the facts which they reveal as for the process which using them involves. I suggest that the best way to apply concordancers to language learning is not by offering learners more comprehensive, more accurate descriptions of the language, but by offering them the chance to explore the language for themselves. It is, I suggest, precisely this process of defining a problem, attempting to provide some answer to it, testing one’s answer, re-formulating or refining or abandoning it in the light of the evidence, that is the most valuable application of concordancers. The process of hypothesis formation, hypothesis testing and confirmation, and hypothesis re-formulation is central to the language acquisition process. Concordancers allow learners to explore for themselves, to try out their theories, to follow their hunches, to find the limitations of an explanation or description. Such activities are not simply consciousness-raising ones: they go to the heart of language learning by encouraging learners to push and stretch their interlanguage. They are, or can be, a model for the language learning process itself.

Furthermore, by handing over the concordancer to the learner (to the extent that the teacher actually does this) we are taking learner autonomy very seriously indeed. The learner has direct access to the language, and has control over the uses he or she makes of that source. While I have some reservations about totally free access to large corpora, I have no doubt that all students can benefit enormously from concordancing – from taking part in the process of exploring the language rather than listening to or reading somebody else’s explanation of it. Explanations have their place, but so do explorations. As Higgins (1989) says “When the answers are easy to find, it encourages you to ask more questions”.

The idea is to let the learner become an active participant in choosing what data to examine, and what conclusions to draw from it, rather than being the passive recipient of data chosen by others, and of conclusions reached by others. Futher justification for making the learner an active participant comes from first and second language acquisition research, some os which has already been mentioned in the section on the lexical phrase. It seems likely that an important part of the learning process by which we acquire our first and then other languages is the successive re-formulation of rules, theories, hypotheses, views of the language. We begin with crude generalisations that ignore a lot of the evidence (of which we are largely ignorant) and we continue by re-formulating these generalisations into more sophisticated rules which account for more of the evidence and which gradually approximate to the native speaker’s view of the language. Long (1987, 2003) and others have argued that for learners to continue developing towards more native-like use of the target language, they must continue to “notice” things about it, and that their interlanguage must be “pushed” and “stretched” to avoid fossilization. It is surely not necessary to explain how learners using concordancers to conduct their own research fits in with such findings.



How then can we put this approach into practice?

There are many accounts from English language teachers of how concordancers are now being used with learners – a Google search will reveal dozens – but the one recommended by Johns (1987) is probably still the “classic”. Johns claims that his “Data-driven learning” methodology is based on an inductive approach where one takes raw data as the starting point and then, by a series of inductive steps, one tries to discover the patterns in the language and the rules which govern those patterns. Johns suggests three steps:

• identification
• classification
• generalization

With this approach one selects a language point to investigate and assembles concordanced evidence of the point to be investigated. One then studies the evidence to identify the salient features, which one then tries to classify. The final, generalization, phase is the formulation of a rule or rules that will encapsulate one’s interpretation for future use. The example of “however”, as discussed by Murison-Bowie (1993), might make this clearer.

• identification. Our first study of a concordance of however enabled us to identify the fact that in the vast majority of cases the word was preceded and succeeded by a punctuation mark. We further identified the fact that in 40% of our sample however occurred at the beginning of a sentence. We also noticed that there was a group of occurrences where there was no punctuation after the Search Word.

• classification. We had identified that there was one set – the larger – of howevers where punctuation marks came before and after, and another set where there was no punctuation after. By looking more closely at these two sets we could see that however was practically always followed by a punctuation mark when it was used as a contrastive adverb. There was the other set of examples, characterized by having no succeeding punctuation, where however was followed by an adjective or adverb and had the meaning of “to whatever extent or degree”.

• generalization. The word however has (at least) two meanings:

1. where it is used to contrast one statement with an earlier statement. When used like this it is generally preceded and succeeded by punctuation marks and it is very frequently used at the beginning of a sentence;

2. “to whatever extent or degree”. When used in this way, it is never succeeded by a punctuation mark and is always followed either by an adverb (much is a common one) or by an adjective (strong, great, silly, brutal seem typical).

This approach can work very well, but I think it has severe and unnecessary limitations. Inductive reasoning is stuck with the fact that you cannot logically move from the particular to the general: no amount of well-observed and well-classified data can ever prove a generalisation. Furthermore, it implies that the facts somehow speak for themselves, and ignores the fact that, as I have already suggested, we examine data with an in-built propensity to notice certain things that are currently relevant to us.

Deductive reasoning, on the other hand, is both logically consistent and better placed to provide us with a satisfactory methodology. It starts at the top – with a generalisation, a theory, a hypothesis, about something, and then looks at the data. This involves articulating a problem, and examining one’s present answers to it, before observing the data. If we are attracted by this argument, and decide to try a deductive approach, then we still have to answer the question: how do you look at the data? We look at the data in order to examine our present answer or answers to the problem we have in mind. Thus, if before doing a search on “back*”, we had asked the question: In how many different ways is “back” used?, and we had made a (quick) list of the ways we could think of, and discussed the list, when we finally observed the data, we would look for examples that were not covered by our list, and ones that seemed to contradict any general opinions we had given in the discussion. We would end up with a better list and probably a lot more questions.

In fact,there is no need to get into arguments about whether the top-down or bottom-up approach is “right”. Again, this is not an “either-or” question. The bottom-up and top-down approaches can be, and usually are, used together in most types of interpretation of data, including concordances. The problem with Johns’ three step Observation, Classification, Generalisation method is that it starts too late and finishes too early. It does not invite us to stop and articulate the problem, and our current view of the possible solution, before we start, and it does not invite us to test the conclusions after we have made them, by returning to observe the data critically.

The process is fluid, and involves more than one “pass” through. Perhaps the metaphor of “opening” and “closing” strategies can help us to approach the interpretation of concordance data. We could decide if we wanted to look at the data with as open a mind as possible, or to use the data to close in on a category, distinction, rule, theory, guess, or explanation. If we take this “opening” and “closing” metaphor further than is probably good for it, we could see the “observation” stage as “opening” and then the classification and generalisation as “closing”. The classifications and generalisations can then be tested, and this time the observation will be a further “closing” step. There is probably too much “closing” here for most people’s tastes, but perhaps the metaphor helps suggest two different focuses to use when interpreting the data.

Whether one opts for trying to formulate a hypothesis or generalisation, or for finding its limitations, starting out with a well-articulated problem in mind, and some, however provisional, answer to it, and then using the concordancer to see how well the answer fits the evidence, gives the search a clear focus. Perhaps the answer put forward to the question posed to the concordancer is not very seriously challenged by the resulting data, in which case one may decide that the answer was basically right, and that none of the exceptions is interesting enough to warrant a re-formulation of it. If, on the other hand, the answer is (partly) contradicted by the data, or if it fails to explain (some of) the data, then one may decide to add an ad hoc hypothesis, or to completely abandon the answer and take a closer look. Even more likely for some users is the formulation of another question, which will start them off down a winding road to who knows where.

This approach can be put to good effect when the program is used as a “sleeping resource”. If a teacher has a portable computer at his disposal, the computer can be taken into the classroom with the concordancer loaded, and it simply sits there until a problem that it can help with turns up. By attempting some answer to the question that arises before consulting the concordancer, the teacher and students are, I think, better able to make some immediate sense of the data. It slows the users down, forcing them stop and think and prepare themselves before diving into the mass of data.



Finally, practical issues involved in concordancer use.

First, what types of question can we put to a concordancer? Two general types of questions:

“How, when, where, how often, in what company, is x used?”


“What is the difference between the way x and y are used?”

(where x and y can be prefixes, words, past tenses, or lexical phrases, for example) can apply to questions about the structure of the language, or particular items of lexis. Any question that one might consult a pedagogical grammar about can be put to the program, as can many question that one might consult a dictionary or thesaurus about. Below are some examples, which are intended to do no more than illustrate this point.

1. Searches on structural aspects of the language

– Verb tenses

* the uses of the present continuous

Search word: “*ing”, with “amareisam notis notare not” as the context words and 4 words to the left as the horizon.

* conditionals

Search word: “if”.

* present perfect versus past

Search for one verb where the past tense and past participle are different.

* modals

Search word: “will”, “can”, “might”, etc.

– Phrasal verbs

Search word: “up”. How many different verbs can you find that precede “up”?
Search word: “get” How many different prepositions/adverbs can you find that follow “get”?

– Comparatives and superlatives

Search words: “than”, “*est”, “more”, “most”, etc.

– Countable and uncountable nouns

Search words: “how muchhow many”, “lessfewer”.

– Conjunctions of time

Search words: “after”, “before”, “as soon as”, “until”, “when”, etc.

– Conjunctions

Search words: “and”, “but”, “although”, “because”, “that”, “as”, “until”, etc.

– Relative pronouns

Search words: “who”, “which”, “that”, “when”, “why”, etc.

2. Searches on lexical items

– Polysemous words

In how many different ways are “table”, “chair”, “head”, “back”, “break”, “let”, “tight”, “yet”, “long”, “over”, “row”, “so”, “screen”, etc. used? Is a word used in different ways in different types of text? (For example searching for “interest” in business and in arts corpora, as described by Murison-Bowie (1993)

– Adjectives before certain nouns

What adjectives come before “news”, “government”, “book”, “mother”, “home”, “report”, etc.?

– Intensifiers before adjectives

Search words: “rather/quite”, “extremely/immensely”, etc.

– Synonyms, “semi-synonyms” and “confusables”

Search words: “strong/powerful”, “certain/sure”, “envious/jealous”, “convince/persuade”, “big/large”, “thick/wide”, etc.

– “False friends” and “semi-false friends”

For Spanish students, the list could include “actual*”, “particular”, “conductor”, “discuss*”, “educated”, “professor”, “advise”, “agenda”, “career”, “camping”, “parking”, “card”, “sympathetic”, “sensible”, “brave”, “egoistic”, “amusing”, “emotional”

– Word building

Search words: “*truth*”, “*govern*”, “*manag*”, “*great*”, etc.

– Prefixes and suffixes

Search words: “re*”, “un*”, “pre*”, “dis*”, “*ness”, “*ion” “*ly”, etc.

– Lexical phrases (polywords, institutionalized expressions, phrasal constraints, sentence builders)

Search words: “by the way”, “at any rate”, “for that matter”, “as it were”, “all in all”, “be that as it may”, “a * ago”, “as I was *”, “see you *”, “yours *”, “as far as I *”, “what with *”, “let me start by *”, “that reminds me of *”, etc.



A concordancer can be used for individual, pair, small or large group activities. The activities will basically involve

posing a question
explaining how to do a particular search
setting tasks

Any of the questions mentioned above can serve, searches can be of edited texts, selected texts, or the whole collection, and tasks can include writing down examples, making lists, categories and generalisations, looking for exceptions, consulting grammar books and dictionaries, comparing results, writing reports or giving oral reports of their work.

The activities can be initiated by:

– Worksheets which set out the three steps mentioned above. The worksheets could include print-outs of concordances, with or without gaps.

– The teacher who chooses the question, sets the class a task, and leads them through it.

– Students who formulate their own questions, and decide on their own tasks.

– Students who set tasks for other students.

Using the concordancer as a “sleeping resource” is another possibility: in the classroom, the teacher gets on with a lesson, and the concordancer sits in the corner, just in case.

My experience suggests that worksheets are almost essential to get teachers and learners beyond the first “What do we do with this thing?” phase of using concordancers. Appendix 1 provides 2 examples, which I hope exemplifies the methodology proposed above. The worksheets can also be used outside the classsroom. If the activities are done with students in a computer room, I suggest that the students work in pairs or groups of three, and write down their answers to the questions. When they have finished, the teacher can get reports on the work from each pair or group. The worksheets provided in the Appendix are, of course, intended to act as no more than a springboard, and to the extent that they stifle the creative use of a concordancer they should be ignored. The thrust of the argument here has been that learners will benefit more from being involved in the process of concordancing than from contemplating the results of expert researchers, and the implication is clearly that the more the questions and answers spring from the learners themselves, the more likely it is that they will learn from the experience.

References are provided at the end of Part 1.



Activity: Examine the different ways that for and since are used with the present perfect

Level: Intermediate

Warm Up

How do you think the two words above are used? Here are two examples:

I haven’t seen Jim for two months.

I haven’t seen Jim since May.

Basically, “for” tells us how long something has lasted, its duration, and “since” tells us when it started. We use “for” with periods of time:

for three weeks, for a long time, for many years

and “since” with points in time:

since 3.30, since Tuesday, since I was young, since the last time I saw her.

Now we will see what the concordancer can find. It is important to have the BASIC INSTRUCTIONS sheet with you, so that you can follow the steps.


1. Go to http://corpus.byu.edu/coca/

2. Type in: sincefor as the search words.

3. Choose to highlight context word(s) and type “havehas”.

5. You see the texts that the program is sorting through, and a running total of the number of examples it has found. When the total is 70, hit the Esc key.

6. You see at the bottom of the screen a report on how many examples it found, and their frequency.

7. You see the examples of the 2 words in the middle of lines of text. They are sorted with first left as first priority, and Search Word as second priority.

8. Look through the examples.

9. Go back to the top of the search.

QUESTION: what words occur after “for” and “since”? Are there any examples that surprise you? Write down some examples of words that come after “for” and “since”.

10. Sort the entries to get Context words as first priority and Search Word as second priority.

11. Use the arrow keys  to look through the examples again.
QUESTION: How many main verbs can you find? Write down 10.

12. Sort the entries to get Search Word as first priority, and then hit F9 to sort the Capitals.

QUESTION: How many examples are there where “For” or “Since” begin a sentence? How are the two words used in these examples? Are there any examples where “For” or “Since” do not refer to periods or points in time? Write down any interesting examples.

13. Move to see the rest of the examples.

QUESTION: Are there any other examples where “for” and “since” do not refer to a period or a point in time? If there are, what do the words “for” and “since” mean in them?

14. Make the search word disappear. Just by the length of the word you should be able to say which of the two words belongs in each line. Can you?

QUESTION: Do you understand the explanation at the beginning of the exercise any better now? Would you add anything to it?


ACTIVITY 2: Compare how hope and expect are used.

Level: Intermediate

Warm Up

What is the difference between these two sentences:

I hope to go to university next year.
I expect to arrive at six. ?

One explanation could be that the first sentence expresses a desire: I want to go to university, but the second expresses a more or less rational opinion: I think I will arrive at six. In the sentences above, either “hope” or “expect” could be used, but not in these sentences:

He’s the best hope we’ve got. (NOT: x He’s the best expect we’ve got. x)
I can’t expect you to understand. (NOT: x I can’t hope you to understand. x)

Can you explain this? Which do you think is more common: hope or expect? What are your reasons? Now we will see what the concordancer can find.


1. Go to http://corpus.byu.edu/coca/

2. Type in: hopeexpect as the search words.

3. You see the texts that the program is sorting through, and a running total of the number of examples it has found. Stop whe you get to around 70. t.

4. You see a report on how many examples it found, and their frequency.

5. You see the examples of the 2 words in the middle of lines of text.

7. Look through the examples. If you want to see more text for a particular example, use the arrow keys  .

8. Sort the entries with

Search Word as first priority, and
1st Right as second priority.

9 Look through the examples again.
QUESTION: Which is more frequent: “hope” or “expect”? Is this what you expected?

10. Sort the entries with

1st Right as first priority, and
Search Word as second priority.

QUESTION: In how many examples is it possible to use either “hope” or “expect”? Write down 3 examples.

11. Sort the entries with

Search Word as first priority, and
1st Right as second priority.

Move to the “hope” block.

QUESTION: How many examples are there of “hope” used as a noun?

Write down a few examples.

QUESTION: Do you see any examples of “in the hope of” or “in the hope that”? How many?

QUESTION: Do you understand the explanation at the beginning of the exercise better now? Is it adequate? Would you make any changes to it?

One thought on “* Concordancers 2

  1. For the historical record, in 1992 Simon Murison-Bowie of OUP asked David Hardisty and me to help him persuade Tim Johns to let OUP publish a version of Tim’s Microconcord program which Tim carried around on his NewBrain microcomputer and demonstrated at IATEFL’s 1992 conference. Tim, a sort of wayward genius, was extremely difficult to pin down. After a frustrating meeting in Paris with Tim, David; Simon and I designed the front end of the program, and wrote a manual to go with it. Mike Scott did the coding for the final version of Microconcord which was published in 1993, and included the manual and a 1 million word corpus, I think.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s