Interlanguage Development: Some Evidence


As a follow-up to my two previous posts, here’s some information about interlanguage development.

Doughty and Long (2003) say

There is strong evidence for various kinds of developmental sequences and stages in interlanguage development, such as the well known four-stage sequence for ESL negation (Pica, 1983; Schumann, 1979), the six-stage sequence for English relative clauses (Doughty, 1991; Eckman, Bell, & Nelson, 1988; Gass, 1982), and sequences in many other grammatical domains in a variety of L2s (Johnston, 1985, 1997). The sequences are impervious to instruction, in the sense that it is impossible to alter stage order or to make learners skip stages altogether (e.g., R. Ellis, 1989; Lightbown, 1983). Acquisition sequences do not reflect instructional sequences, and teachability is constrained by learnability (Pienemann, 1984).

Let’s take a look at the “strong evidence” referred to, beginning with Pit Corder and error analysis.


Pit Corder: Error Analysis

Corder (1967) argued that errors were neither random nor systematic results of L1 transfer; rather, they were indications of learners’ attempts to figure out an underlying rule-governed system. Corder distinguished between errors and mistakes: mistakes are slips of the tongue, whereas errors are indications of an as yet non-native-like, but nevertheless, systematic, rule-based grammar. Interesting and provocative as this was, error analysis failed to capture the full picture of a learner’s linguistic behaviour. Schachter (1974) compared the compositions of Persian, Arabic, Chinese and Japanese learners of English, focusing on their use of relative clauses. She found that the Persian and Arabic speakers had a far greater number of errors, but she went on to look at the total production of relative clauses and found that the Chinese and Japanese students produced only half as many relative clauses as did the Persian and Arabic students. Schachter then looked at the students’ L1 and found that Persian and Arabic relative clauses are similar to English in that the relative clause is placed after the noun it modifies, whereas in Chinese and Japanese the relative clause comes before the noun. She concluded that Chinese and Japanese speakers of English use relative clauses cautiously but accurately because of the distance between the way their L1 and the L2 (English) form relative clauses. So, it seems, things are not so straightforward: one needs to look at what learners get right as well as what they get wrong.


The Morpheme Studies

Next came the morpheme order studies. Dulay and Burt (1974a, 1974b) claimed that fewer than 5% of errors were due to native language interference, and that errors were, as Corder suggested, in some sense systematic, that there was something akin to a Language Acquisition Device at work not just in first language acquisition, but also in SLA.

The morpheme studies of Brown in 1973 resulted in his claim that the morphemes below were acquired by L1 learners in the following order:

1 Present progressive (-ing)

2/3 in, on

4 Plural (-s)

5 Past irregular

6 Possessive (-’s)

7 Uncontractible copula (is, am, are)

8 Articles (a, the)

9 Past regular (-ed)

10 Third person singular (-s)

11 Third person irregular

12 Uncontractible auxiliary (is, am, are)

13 Contractible copula

14 Contractible auxiliary

This led to studies in L2 by Dulay & Burt (1973, 1974a, 1974b, 1975), and Bailey, Madden & Krashen (1974), all of which suggested that there was a natural order in the acquisition of English morphemes, regardless of L1. This became known as the L1 = L2 Hypothesis, and further studies (by Ravem (1974), Cazden, Cancino, Rosansky & Schumann (1975), Hakuta (1976), and Wode (1978) all pointed to systematic staged development in SLA.

Some of these studies, particularly those of Dulay and Burt, and of Bailey, Madden and Krashen, were soon challenged, but over fifty L2 morpheme studies have since been carried out using more sophisticated data collection and analysis procedures, and the results of these studies have gone some way to restoring confidence in the earlier findings.


Selinker’s Interlanguage.

The third big step was Selinker’s (1972) paper, which argues that the L2 learners have their own autonomous mental grammar (which came to be known as interlanguage grammar), a grammatical system with its own internal organising principles. One of the first stages of this interlanguage to be identified was that for ESL questions. In a study of six Spanish students over a 10-month period, Cazden, Cancino, Rosansky and Schumann (1975) found that the subjects produced interrogative forms in a predictable sequence:

  1. Rising intonation (e.g., He works today?),
  2. Uninverted WH (e.g., What he (is) saying?),
  3. “Overinversion” (e.g., “Do you know where is it?),
  4. Differentiation (e.g., “Does she like where she lives?).

Then there was Pica’s study of 1983 which suggested that learners from a variety of different L1 backgrounds go through the same four stages in acquiring English negation:

  1. External (e.g., No this one./No you playing here),
  2. Internal, pre-verbal (e.g., Juana no/don’t have job),
  3. Auxiliary + negative (e.g., I can’t play the guitar),
  4. Analysed don’t (e.g., She doesn’t drink alcohol.)

Apart from these two examples, we may cite the six-stage sequence for English relative clauses (see Doughty, 1991 for a summary) and sequences in many other grammatical domains in a variety of L2s (see Johnston, 1997).


 Pienemann’s 5-stage Sequence.

Perhaps the most extensive and best-known work in this area has been done by Pienemann whose work on a Processability Theory started out as the Multidimensional Model, formulated by the ZISA group mainly at the University of Hamburg in the late seventies. One of the first findings of the group was that all the children and adult learners of German as a second language in the study adhered to the five-stage developmental sequence shown below:

Stage X – Canonical order (SVO)

die kinder spielen mim bait //// the children play with the ball

(Romance learners’ initial SVO hypothesis for GSL WO is correct in most German sentences with simple verbs.)

Stage X + I – Adverb preposing (ADV)

da kinder spielen //// there children play

(Since German has a verb-second rule, requiring subject—verb inversion following a preposed adverb {there play children), all sentences of this form are deviant. The verb-second (or ‘inversion’) rule is only acquired at stage X + 3, however. The adverb-preposing rule itself is optional.)

Stage X + 2 – Verb separation (SEP)

alle kinder muss die pause machen //// all children must the break have

(Verb separation is obligatory in standard German.)

Stage X+3 – Inversion (INV)

dam hat sie wieder die knock gebringt //// then has she again the bone brought

(Subject and inflected verb forms must be inverted after preposing of elements.)

Stage X+4 – Verb-end (V-END)

er sagte, dass er nach house kommt //// he said that he home comes

(In subordinate clauses, the finite verb moves to final position.)

Learners did not abandon one interlanguage rule for the next as they progressed; they added new ones while retaining the old, and thus the presence of one rule implies the presence of earlier rules.

A few words about the evidence. There is the issue of what it means to say that a structure has been acquired, and I’ll just mention three objections that have been raised. In the L1 acquisition of morphemes, a structure was assumed to be acquired when it occurred three times in a row in an obligatory context at a rate of 90%. The problem with such a measurement is, first, how one defines an “obligatory” context, and second, that by only dealing with obligatory contexts, it fails to look at how the morphemes might occur in incorrect contexts. The second example is that Pienemann takes acquisition of a structure as the point at which it emerges in the interlanguage, its first “non-imitative use”, which many say is hard to operationalise. A third example is this: in work reported by Johnson, statistical measures using an experimental group of L2 learners and a control group of native speakers have been used where the performance of both groups are measured, and if the L2 group performance is not significantly different from the control group, then the L2 group can be said to have acquired the structure under examination. Again, one might well question this measure.

To return to developmental sequences, by the end of the 1990s, there was evidence of stages of development of an interlanguage system from studies in the following areas:

  • morphemes,
  • negation,
  • questions,
  • word order,
  • embedded clauses
  • pronouns
  • references to the past



Together these studies lend very persuasive support to the view that L2 learners follow a fairly rigid developmental route. Moreover, it was seen that this developmental route sometimes bore little resemblance to either the L1 of the learner, or the L2 being learnt. For example, Hernández-Chávez (1972) showed that although the plural is realised in almost exactly the same way in Spanish and in English, Spanish children learning English still went through a phase of omitting plural marking. It had been assumed prior to this that second language learners’ productions were a mixture of both L1 and L2, with the L1 either helping or hindering the process depending on whether structures are similar or different in the two languages. This was clearly shown not to be the case. All of which was taken to suggest that SLA involves the development of interlanguages in learners, and that these interlanguages are linguistic systems in their own right, with their own sets of rules.

There are lots of interesting questions and issues that I haven’t even mentioned here about interlanguage development in general and about orders of acquisition in SLA in particular. It’s worth pointing out that Corder’s and Selinker’s initial proposal of interlanguage as a construct was an attempt to explain the phenomenon of fossilisation. As Tarone (2006) says:

Second language learners who begin their study of the second language after puberty do not succeed in developing a linguistic system that approaches that developed by children acquiring that language natively. This observation led Selinker to hypothesize that adults use a latent psychological structure (instead of a LAD) to acquire second languages.  

The five psycholinguistic processes of this latent psychological structure that shape interlanguage  were hypothesized (Selinker, 1972) to be (a) native language transfer, (b) overgeneralization of target language rules, (c) transfer of training, (d) strategies  of communication, and (e) strategies of learning.

It wasn’t long before Krashen’s Monitor Model claimed that there was no evidence of L1 transfer in the morpheme studies, denied the central role of L1 transfer which the original Interlanguage Hypothesis gave it, and also denied that there were sensitive (critical) periods in SLA. Generativist studies of SLA also minimised the role of L1 transfer. And there have been some important updates on the interlanguage hypothesis since the 1980s, too (see Tarone (2006) and Hong and Tarone (2016) for example).

My main concern in discussing interlanguage development, as you must be all too well aware by now, is to draw attention to the false assumptions on which coursebook-based ELT are based. Coursebooks assume that structures can be learned on demand. If this were the case, then acquisition sequences would reflect the sequences in which coursebooks present them, but they do not. On the contrary, the acquisition order is remarkably resilient to coursebook presentation sequences. Long (2015, p. 21) gives some examples to demonstrate this:

…. Pica (1983) for English morphology by Spanish-speaking adults, by Lightbown (1983) for the present continuous -ing form by French-speaking children in Quebec being taught English as a second language (ESL) using the Lado English series, by Pavesi (1986) for relative clauses by children learning English as a foreign language (EFL) in Italy and Italian adults learning English naturalistically in Scotland, and by R. Ellis (1989) for English college students learning word order in German as a foreign language.

Long goes on to point out that accuracy orders and developmental sequences found in instructed settings match those obtained for the same features in studies of naturalistic acquisition, and that the striking commonalities observed suggest powerful universal learning processes are at work. He concludes (Long, 2015, p.23):

… instruction cannot make learners skip a stage or stages and move straight to the full native version of a construction, even if it is exclusively the full native version that is modelled and practiced. Yet that is what should happen all the time if adult SLA were a process of explicit learning of declarative knowledge of full native models, their comprehension and production first proceduralized and then made fluent, i.e., automatized, through intensive practice. One might predict utterances with occasional missing grammatical features during such a process, but not the same sequences of what are often completely new, never-modelled interlingual constructions, and from all learners.

While practice has a role in automatizing what has been learned, i.e., in improving control of an acquired form or structure, the data show that L2 acquisition is not simply a process of forming new habits to override the effects of L1 transfer; powerful creative processes are at work. In fact, despite the presentation and practice of full native norms in focus-on-forms instruction, interlanguages often stabilize far short of the target variety, with learners persistently communicating with non-target-like forms and structures they were never taught, and target-like forms and structures with non-target-like functions (Sato 1990).



That’s a taste of the evidence. We can’t conclude from it, as a few insist, that there’s no point in any kind of explicit teaching, but it does mean that, in Doughty and Long’s words (2003):

The idea that what you teach is what they learn, and when you teach it is when they learn it, is not just simplistic, but wrong.

The dynamic nature of SLA means that differentiating between different stages of interlanguage development is difficult – the stages overlap, and there are variations within stages – and so the simplistic view of a “Natural Order”, where a learner starts from Structure 1 and reaches, let’s say, Structure 549, is absurd. Imagine trying to organise stages such as those identified by Pienemann into ordered sets! As Gregg (1984) points out:

If the structures of English are divided into varying numbers of ordered sets, the number of sets varying according to the individual, then it makes little sense to talk about a ‘natural order’. If the number of sets varies from individual to individual; then the membership of any given set will also vary, which makes it very difficult to compare individuals, especially since the content of these sets is virtually completely unknown.

So the evidence of interlanguage development doesn’t mean that we can design a syllabus which coincides with any “natural order”, but it does suggest that we should respect the learners’ internal syllabuses and their developmental sequences, which most coursebooks fail to do. Doughty and Long (2003) argue that the only way to respect the learner’s internal syllabus is

by employing an analytic, not synthetic, syllabus, thereby avoiding futile attempts to impose an external linguistic syllabus on learners (e.g., the third conditional because it is the third Wednesday in November), and instead, providing input that is at least roughly tuned to learners’ current processing capacity by virtue of having been negotiated by them during collaborative work on pedagogic tasks.

Long has since (Long, 2015) given a full account of his own version of task-based language teaching, and whether or not we are in a position to implement a similar methodology in our own teaching situations, at least we can agree that we’d be well-advised to concentrate more on facilitating implicit learning than on explicit teaching, to give more carefully-tuned input, and to abandon the type of synthetic syllabus used in coursebooks in favour of an analytic one.



Sorry, can’t give all references. Here are a few of “key” texts. Tarone (2006) free to download (see below) is a good place to start.

Adjemian , C. (1976) On the nature of interlanguage systems. Language Learning 26, 297–320.

Bailey,N., Madden, C., Krashen, S. (1974) Is there a “natural sequence” in adult second language learning? Language Learning 24, 235-243.

Corder, S. P.  (1967) The  significance  of  learners’ errors. International Review of

Applied Linguistics (IRAL) 5, 161-9.

Corder, S. P. (1981) Error analysis and interlanguage. Oxford: Oxford University Press.

Dulay, H. and Burt, M. (1974a) Errors and strategies in child second language acquisition. TESOL Quarterly 8, 12-36.

Dulay, H. and Burt, M. (1974b) Maturational sequences in child second language acquisition. Language Learning 24, 37-53.

Doughty, C. and Long, M.H. (2003) Optimal Psycholinguistic Environments for Distance Foreign Language Learning. Downloadable here:

Gregg, K. R. (1984) Krashen’s monitor and Occam’s razor. Applied Linguistics 5, 79-100.

Hong, Z. and Tarone, E. (Eds.) (2016) Interlanguage Forty years later. Amsterdam, Benjamins.

Krashen S (1981) Second language acquisition and second language learning.  Oxford: Pergamon Press.

Long, M. H. (2015) SLA and TBLT. Oxford, Wiley Blackwell.

Nemser W (1971) Approximative systems of foreign language learners.’ IRAL 9, 115–23.

Selinker L (1972). ‘Interlanguage.’ IRAL 10, 209–231.

Selinker L  (1992).  Rediscovering interlanguage.   London: Longman.

Schachter, J. (1974) An error in error analysis. Language Learning 24, 3-17.

Tarone E (1988) Variation in interlanguage. London: Edward Arnold.

Tarone, E. (2006) Interlanguage. Downloadable here:


What good is relativism?


Scott Thornbury (2008) asks “What good is SLA Theory?” . This is a question beloved of populists, all of whom agree that it’s of no use to anyone, except the rarefied crackpots who dream it up. Thornbury sets the tone of his own populist piece by saying that most teachers display a general ignorance of, and indifference to, SLA theory, due to “the visceral distrust that most practitioners feel towards ivory-tower theorising”.  If he’d said that most English language teachers have an ingrained distrust of academic research into language learning, we might have asked him for some evidence to support the assertion, but who can question that ivory tower theorists are not to be trusted? Note how Thornbury, who teaches a post-graduate course on theories of SLA at a New York university, and who has published many articles in serious, peer-reviewed journals, smears academics with the “ivory tower” brush, while himself sidling up to the hard-working, down to earth sceptics who read the English Teaching Professional magazine

Thornbury gives a brief sketch of 4 types of SLA theory and then gives 4 reasons why “knowledge of theory” is a good thing for teachers. But you can tell that his heart’s not in it.  He knows perfectly well that “knowledge” of the theories of SLA he mentions is of absolutely no use to anybody unless those theories are properly scrutinised and evaluated, but, rather than attempt any such evaluation, Thornbury prefers to devote the article to reassuring everybody that there’s no need to take SLA theories too seriously.

To help him drive home this anti-intellectual message, Thornbury turns to “SLA heavyweight” John H Schumann. Most SLA scholars regard the extreme relativist position Schumann adopts in his 1983 article as almost comically preposterous, while his acculturation theory is about as “heavyweight” as Dan Brown’s theory of the Holy Grail.  But anyway, judge for yourself.  Schumann (1983) suggests that theory construction in SLA should be regarded not as a scientific task, but as a creative endeavour, like painting. Rather than submitting rival theories of SLA to careful scrutiny, looking for coherence, logical consistency and empirical adequacy, for example, Schumann suggests that competing theories of SLA should be evaluated in the same way that one might evaluate different paintings.

“When SLA is regarded as art not science, Krashen’s and McLaughlin’s views can coexist as two different paintings of the language learning experience… Viewers can choose between the two on an aesthetic basis favouring the painting which they find phenomenologically  true to their experience.”

Thornbury seems to admire this suggestion. He comments:

“This is why metaphors have such power. We tend to be well disposed to a theory if its dominant imagery chimes with our own values and beliefs. If we are inclined to think of learning as the meeting of minds, for example, an image such as the Zone of Proximal Delevopment is more likely to attract us than the image of a black box.”

Schumann’s paper was an early salvo in what, 10 years later, turned into a spirited war between academics who adopted a relativist epistemology; and those who held to a rationalist epistemology. The war is still waging, and, typically enough, Thornbury stays well clear of the front line, while maintaining friendly relations with both camps. But let’s be clear: relativism, even though not often taken to the extreme that Schumann does, is actually taken seriously by many academics, including Larsen-Freeman and sometimes (depending on how the wind’s blowing) by Thornbury himself. Rational criteria for the evaluation of rival theories of SLA, including logical consistency and the weighing of empirical evidence, are abandoned in favour of the “thick description” of different “stories” or “narratives”,  all of them deemed to have as much merit as each other. Relativists suggest that trying to explain SLA in the way that rationalists (or “positivists” as they like to call them) do is no more than “science envy”, and basically a waste of time. Which is actually the gist of Thornbury’s argument in the 2008 article discussed here.

In response to this relativist position, let me quote Larry Laudan, who says

“The displacement of the idea that facts and evidence matter by the idea that everything boils down to subjective interests and perspectives is—second only to American political campaigns—the most prominent and pernicious manifestation of anti-intellectualism in our time.”


Thornbury asks “What good is SLA theory?” without making any attempt to critically evaluate the rival theories he outlines. But then, why should he? After all, if you adopt a relativist stance, then no theory is right, none is of much importance, so why bother to sort them out? Instead of going to all that unnecessary trouble, all you have to do is take a quick look at Thornbury’s little summary in Table 1 and choose the theory that grabs you, or rather, choose the “dominant metaphor” which best chimes with your own values and beliefs. And if you can’t be bothered to check out which theory goes best with your values and beliefs, then why not use some other, equally arbitrary subjective criterion? You could toss a coin, or stare intently at a piece of toast, or ask Jeremy Harmer.

“What good is SLA theory?” is actually a very stupid question. It’s as if “SLA theory” were some sort of uncountable noun, like toothpaste. What good is toothpaste? It doesn’t actually make much difference to brushing your teeth. But “SLA theory” is not uncountable; some SLA theories are very bad, and some are very good, and consequently, we need to agree on criteria for evaluating them so as to concentrate on what we can learn from the best theories. Instead of pandering to the misinformed view that SLA theories are equally unscientific, equally based on metaphors, equally relative in their appeal, Thornbury could have used the space he had in the journal to examine – however “lightly”- the relative merits of the theories he discusses, and the usefulness to teachers of the best theories.  He could have mentioned some of the findings of psycholinguistic research into the influence of the L1; age differences and sensitive periods; error correction; incomplete trajectories; explicit and implicit learning, and much besides. He could have mentioned one or two of the most influential current hypotheses about SLA, for example that instruction can influence the rate but not the route of interlanguage development.

He could have also pointed out that those adopting a relativist epistemology have achieved very little; that Larsen-Freeman’s exploration of complexity theory has achieved precisely nothing; that his own attempts to use emergentism to conjure up “grammar for free” have been equally woeful; and that the relativists he supports are more responsible than anyone else for the popular view that academics sit in an ivory tower writing unintelligible articles packed with obscurantist jargon for publication in journals that only they bother to read.


Laudan, L. (1990) Science and Relativism: Dialogues on the Philosophy of Science. Chicago, Chicago University Press.

Schumann, J. H. (2003) Art and Science in SLA research. Language Learning, 33, 409 – 75.

Why PPP makes no sense at all. A reply to Anderson




I made a comment on Jason Anderson’s Blog in reply to his post The PPP Saga Ends  It hasn’t appeared, so here’s an amended version.

Hi Jason,

An interesting journey, and it makes good reading. You make an impressive attempt to defend the indefensible, and there are lots of good references, even if you play fast and loose with what your sources actually say.

To the issues, then.

First, let’s establish what we know about the SLA process after 50 years of SLA research. Students do not learn target forms and structures when and how a teacher decrees that they should, but only when they are developmentally ready to do so. Studies in interlanguage development have shown conclusively that L2 learners exhibit common patterns and features across differences in learners’ age and L1, acquisition context, and instructional approach. Independent of those and other factors, learners pass through well-attested developmental sequences on their way to mastery of target-language structures, or, as is often the case, to an end-state short of mastery.

Acquisition of grammatical structures (and also of pronunciation features and some lexical features such as collocation), is typically gradual, incremental and slow, sometimes taking years to accomplish. Development of the L2 exhibits plateaus, occasional movement away from, not toward, the L2, and U-shaped or zigzag trajectories rather than smooth, linear contours. No matter what the order or manner in which target-language structures are presented to them by teachers, learners analyze the input and come up with their own interim grammars, the product broadly conforming to developmental sequences observed in naturalistic settings. They master the structures in roughly the same manner and order whether learning in classrooms, on the street, or both.

That’s what we know. As a result this statement is plain wrong:

while research studies conducted between the 1970s and the 1990s cast significant doubt on the validity of more explicit, Focus on Forms-type instruction such as PPP, more recent evidence paints a significantly different picture.

It does not. No study conducted in the last 20 years has come up with evidence to challenge the established claim that explicit focus on forms such as PPP can do nothing to alter the route of interlanguage development. As Ortega (2009), in her summary of SLA findings states

Instruction cannot affect the route of interlanguage development in any significant way.

Teaching is constrained by the learners’ own powerful cognitive contribution, and to assume that learners will learn what they’re taught when they’re taught it using a PPP paradigm is false.

These statements are also false:

  • we have no evidence that PPP is less effective than other approaches
  • writers in academia have neither evidence nor theoretical justification for criticising coursebook writers 
  • The research on which writers such as Michael Long have based their promotion of focus on form is scant

But let’s get to the heart of the matter, which is really quite simple. You base your arguments on a non-sequitur  that appears throughout your paper. It’s this:

There is evidence to support explicit (grammar) instruction, therefore there is evidence to support the “PPP paradigm”.

It’s generally accepted, a non-controversial opinion, that explicit instruction has an important role to play in classroom-based SLA, but it doesn’t follow that PPP is a good approach to classroom based ELT.  PPP runs counter to a mass of SLA research findings, and that’s that. There is nothing, I repeat nothing, in “recent evidence from research studies” that supports PPP as an approach to classroom teaching.  You appeal to evidence for the effectiveness of explicit grammar teaching to support the argument that students will learn what they’re taught in class by a teacher implementing a synthetic syllabus, based on the presentation, practice and production of a sequence of chopped up bits of the language, thus making a schoolboy error in logic.

The rest of your paper says absolutely nothing to rescue a PPP approach from the fundamental criticism that students don’t learn an L2 in the way it assumes they do. The paper consists of a series of non-sequiturs and unsupported assertions which attempt to argue that the way the majority of institutions go about ELT is necessarily the best way.

To say that the PPP approach is popular with students and that coursebooks are consumer-driven, and that PPP is attractive to low income countries, and that this is evidence to support a “PPP paradigm” is patently ridiculous. The remarks about low income countries are also patronising and arrogant. You make a naive appeal to an “apples and pears” group of factors that need to be carefully examined and distinguished. I won’t go into any proper analysis now, but, just for example, the multi billion dollar ELT coursebook industry is not so much driven by the opinions of the end users, as by the language teaching institutions, both public and private, that deliver foreign language courses to them. For these institutions, the coursebook is convenient – it packages the otherwise “messy” thing that is language learning.  Which is not to say that it wouldn’t be cheaper, better, more efficient, and more rewarding for everybody if the coursebook were abandoned in favour of locally-produced materials used in a more learner centred approach.

Likewise, to say in reply to Neill that

the notion of ‘linear progress’ is a reflection of a much wider tendency in curricula and syllabi design. Given that the vast majority of English language teaching in the world today is happening in state-sponsored primary and secondary education, where national curricula perform precisely this role, we can predict to a large extent that top down approaches to language instruction are going to dominate for the foreseeable future

is to give absolutely no justification for such top down approaches to language instruction. Yes, as a matter of fact, they dominate ELT today, but that’s no argument in their favour, now is it?

You fail to address the arguments for a learner-centred approach, or any version of the process syllabus suggested by Breen. Those of us who oppose PPP do so not only because it contradicts what we know about SLA, but also because it adopts a pedagogy where students are given no say in the decisions that affect their learning, where the commodification of education goes unchallenged, and where Friere’s “banking” view of education rules. To oppose the way ELT is currently organised is not unrealistic, any more than opposing the privatisation of education in the UK is; but it is difficult. Whatever ones’ views, the kind of faux academic baloney present in your paper really doesn’t help.

Finally, your long quote from Ur in reply to Neill is just one more example of argument by assertion. She’s good at this kind of stuff, and I’m not surprised that you like it, but it’s pure rhetoric. She says “such features as students’ socio-cultural background, relationships, personalities; motivation;” etc., etc,  “often actually have more influence on how grammar is taught, and whether it is successfully learnt, than any of those dealt with in research”. This ignores all the research that has been done into those features, and provides no evidence or arguments to challenge SLA research findings with regard to the development of interlanguages.


Ortega (2009) “Sequences and processes in language learning”. In Long and Doughty (2009) Handbook of Language Teaching. Wiley

Shifting sands and bendy bedrock


Chomsky offers a theory of language and of language learning. The theory claims that all human languages share an underlying grammar and that human beings are born with a knowledge of this grammar, which partly explains how they learn their first language(s) as young children. Criticism of Chomsky’s theory is mounting, as evidenced by a recent article in Scientific American which claims that “evidence rebuts Chomsky’s theory of language learning”. Here, I question that claim.

First, the Scientific American article doesn’t give any evidence to “rebut” Chomsky’s theory. The article talks about counter evidence, but it doesn’t actually give any. The real thrust of the current popular arguments against Chomsky’s theory have nothing to do with its ability to stand up to empirical challenges. Arguments against Chomsky’s theory are based on

  1. the weaknesses in Chomsky’s theory in terms of its reasoning and its falsifiability,
  2. the claim that no recourse to innate knowledge, specifically to a Language Acquisition Device, is necessary, because language learning can be explained by a general learning theory.

As to the first point, I refer you to Sampson   and Bates, the latter particularly eloquently voicing a strong case. You might also look at my discussion of Chomsky’s theory itself. There are, I think, serious weaknesses in Chomsky’s theory. To summarise: it moves the goal posts and it uses ad hoc hypotheses to deflect criticism.

As to the second point, no theory to date has provided an answer to the poverty of the stimulus argument which informs Chomsky’s theory. No attempt to show that usage can explain what children know about language has so far succeeded – none. Theories range from what I personally see as the daft (e.g. Larson Freeman and Cameron ) through the unlikely (e.g. Bates and MacWhinney )  to the attractive (e.g. O’Grady and Rastelli).

As Gregg (1993) makes clear, a theory of language learning has to give a description of what is learned and an explanation of how it’s learned. UG theory acts in a deliberately limited domain. It’s a “property theory” about a set of constraints on possible grammars, which has a causal relation to L1 acquisition through a “transition theory”, which connects UG with an acquisition mechanism that acts on the input in such a way as to lead to the formation of a grammar. Describing that grammar is the real goal of Chomsky’s work. In successive attempts at such a description, those working within a Chomskian framework have made enormous progress in understanding language and in helping those in various fields, IT, for example. Chomsky roots his work in a realist, rational epistemology and in a scientific method which relies on logic and on empirical research.

Any rival theory of language learning must state its domain, give its own property theory (its own account of what language is), and its own transition theory to explain how the language described is learned. You can take Halliday’s or Hoey’s description of language, or anybody’s you choose, and you can then look at the transition theories that go with them. When you do so, you should not, I suggest, be persuaded by silly appeals to chaos theory, or by appeals to the sort of emergentism peddled by Larsen-Freeman, or by circular appeals to “priming”. And you should look closely at the claim that children detect absolute frequencies, probabilistic patterns, and co-occurrences of items in the linguistic environment, and use the resulting information to bootstrap their way into their L1. It’s a strong claim, and there’s interesting work going on around it, but to date, there’s very little reason to think that it explains what children know about language or how they got that knowledge.

To say that Chomsky’s theory is dead and that a new “paradigm” has emerged is what one might expect from a journalist. To accept it as fact is to believe what you read in the press.


The post on Scientific American’s article on Chomsky prompted suggestions for further reading. Here’s a summary.


Kevin Gregg recommends Evans, N. and Levinson, S.C. (2009) The myth of language universals: language diversity and its importance for cognitive science. Behavioural & Brain Sciences 32:429-492.

This is an excellent article. The main article makes an argument that I don’t think stacks up (more importantly, neither does Gregg) but it’s well presented and it’s followed by “Open Peer Commentary”, where a very wide selection of scholars , including Baker, Bevett, Christiansen and Chater, Croft, Adele Golberg, Habour, Nevins, and Pinker & Jakendoff respond.  Very highly recommended.


Scott Thornbury recommends Christiansen and Chater (2016)  Creating Language: Integrating Evolution, Acquisition and Processing. MIT.  Scott makes a refreshing confession that he didn’t finish reading Everett’s awful book Don’t sleep, There are snakes, which inspired his post “P is for Poverty Of the Stimulus”. The post sparked a lively discussion, where Scott showed signs of a less than complete grasp of UG theory, so it’s good to see him recant here on his previous enthusiastic endorsement of Everett. Among other daft stuff, Everett claims that the Pirahā language refutes Chomsky’s claim that recursion is a universal characteristic of natural languages, which it doesn’t.  Anyway, the book Scott recommends looks interesting, and, judging from reviews, follows what we’ve come to expect from Christiansen and his colleagues. At the risk of sounding condescending, it’s good to see Scott moving on from Everett and from the equally unscholarly nonsense found in Larsen Freeman and Cameron’s attempts to promote emergentism, to a more sophisticated view.


Talk of Everett brings us nicely to Ruslana Westerlund, who  urges those “with an open mind and more importantly, critical mind” (sic) to read an article in Harpers magazine which reports on Tom Wolfe’s book on Chomsky, The Origins of Speech. The article is what one might expect from something in Harpers – it’s rubbish, and it only confirms one’s suspicion that Wolfe has nothing much to contribute to any critical debate about Chomsky’s UG theory. Wolfe apparently says that Chomsky is a nerd, and a nasty person to boot, while his hero Everett is a macho man, i.e., in Wolfe’s scheme of things, a good and proper man.  Wolfe thinks that Everett’s ability to pose for a photo up to to his neck in dangerous waters while one of the Pirahā tribe looks on from his boat, is evidence to support Everett’s theory of language learning.  I don’t really get Westerlund’s insistance that only those with open and critical minds will appreciate the Harper piece; I reckon that only those lacking both will be impressed.


Phil Chappell, a valued contributor to this blog, suggests we look at a blog post by a mother with a Ph.D. in linguistics who says that her relationship with her baby proves Chomsky wrong. More rubbish. The Ph.D. enriched mum confuses Chomsky’s treatment of linguistic competence, a carefully defined construct in a deliberately restricted domain, with a baby’s need to interact lovingly with his mother.

Phil also suggests that we read Lee, N., Mikesell, L., Joaquin, A. D. L., Mates, A. W., & Schumann, J. H. (2009). The interactional instinct: The evolution and acquisition of language. Oxford University Press. I’ve read this, well, sort of, and I think it’s terrible. To quote the promotional blurb: “Language acquisition is seen as an emotionally driven process relying on innately specified “interactional instinct.” This genetically-based tendency provides neural structures that entrain children acquiring their native language to the faces, voices, and body movements of conspecific caregivers”.  I don’t know if Phil goes along with this mumbo jumbo, and I hope he’ll comment.

Robert Taylor says “Here’s some interesting research about sounds for common ideas being the same across languages (roughly 2/3rds)”. I’m not sure what to make of it, but maybe it’s grist for the mill.

Finally, I recommend an article from the Stanford Encycopedia of Philosophy, a website that I love and that I visit almost as often as I visit VinoOnLine. The article is called Innateness and Language, and I think it gives a good review of the stuff we’re talking about.  I particularly like its discussion of the Popperian view versus the “inference to the best explanation” view (best articulated, I think by the ever so wonderful Ian Hacking).

Gregg, K. R. (1993) Taking explanation seriously; or, let a couple of flowers bloom. Applied Linguistics 14, 3, 276-294.

IATEFL 2016 Plenary. Scott Thornbury: The Entertainer


So, without more ado, ladies and gentlemen, please put your hands most forcefully together and give it up for the one, the only, the inimitable, the ever-so wonderful ……………… Scott Thornbury!!

And on he walks.

He looks good; he looks fit, well turned out, up for it. Rather than hide behind the lectern and read from a script, he roams the whole expanse of the colossal stage with practised ease, expertly addressing different sections of the huge auditorium , bringing everybody into the warm glow. He starts brilliantly. He puts the years of important signposts of his life on the screen:

  • 1950
  • 1975
  • 1997
  • 2004

and asks for suggestions as to what happened to him in those years.

“Uh oh! There’s “an element” in here today”, he says in response to a group on the right of the hall that’s having fun calling out the wrong answers to his elicitations.

His voice is warm, fruity, well-modulated, and it comes across perfectly, helped by a good PA system and by the fact that the enormous hall is packed with people. Of the IATEFL conference talks I saw on line, there was something near gender equality as far as quality of presentation is concerned, but nobody else reached Scott’s standard. John Faneslow used to be able to put him in the shade, and Michael Hoey on a good day came close, but these days, Scott’s unrivalled: he’s The Entertainer.

And it’s not just the way he performs of course – the best stand up artist depends on his or her material, right? Scott’s plenary had some very good material, and, what’s more, the content was both coherent and cohesive. Scott led us through 50 years of ELT history pointing out that really there’s nothing new under the sun; that we made lots of mistakes, that some “methods” look really weird today, while others that we think of as new were already there in the 60s, and so on.

Having arrived in his history of ELT at 1975, Scott highlighted the publication of the Strategies series of courseboooks, which he describes as “revolutionary”, since they were the first pedagogical material to be based not on grammatical structures but rather, on functions; and the first to be based not on what the language is, but rather on what you do with it.  At this point in the history, Scott came to the main part of his argument.

Two Kinds of Discourse

He suggests that two “intertwining but not interconnecting” discourses can be detected. On the one hand, there’s the “old view” that informs the various methodologies associated with grammar-based teaching. On the other, there’s the “new discourse”, which comes from a functional approach to language  and a more sociolinguistic view of language learning

In the figure below, the “old” view is on the left, and the “new” view is on right. From the top, the categories are:

  • the nature of language
  • units of language acquisition
  • the nature of learning
  • learning path
  • goals.


Scott suggests that the “Strategies” series of coursebooks resolves the argument between these 2 views in favour of the view on the right. Obviously, Scott likes the “new” view, so he was excited when the Strategies series was published – he felt he was at the dawn of a new age of ELT. But, Scott goes on to say, the matter wasn’t in fact resolved: current ELT practice has reverted to reflect the old view. Today, a grammar-based syllabus is used extensively in the global ELT industry.

So, what happened? Why didn’t things change? Why did the old discourse win out? A particularly important question is: Why does the grammar-based syllabus still reign despite clear findings from SLA research? Scott pointed out that SLA research suggests that teachers can’t affect the route of L2 development in any significant way: the inbuilt syllabus triumphs. Grammatical syllabuses fly in the face of the results of SLA research.

Scott showed results from a survey he did of more than 1,000 teachers, which showed that most teachers say they use a grammar based syllabus because students want it. In a way, they blame the students for an approach they say they’re not entirely happy with.

Despairing of finding a solution inside the ELT world, Scott thought maybe he should look at general education. But, when he took a look, he discovered that things in general education are “terrible”. Everywhere knowledge is being broken down into tiny little bits which can then be tested.  He comments: “There’s something really unhealthy in main stream education and it’s exacerbated by a discourse that’s all about McNuggets again.”

Scott then quoted Lin (2013)

“Language teaching is increasingly prepackaged and delivered as if it were a standardised, marketable product…”

“This commodifying ideology of language teaching and learning has gradually penetrated into school practices, turning teachers into ‘service providers’.”

So what’s the solution, then? Determined not to end on such a pessimistic note, Scott suggested three endings:

  1. The pragmatic route
  2. The dogmatic route
  3. The dialectic route

The Pragmatic Route says: Accept things the way they are and get on with it.

The Dogmatic (or Dogmetic!) Route says: Get rid of the coursebook, use communicative activities, and shape the language which emerges from genuine attempts at communication. Unfortunately, Scott said, this will never be really popular; at most it will be a footnote in Richards and Rogers. A more extreme route says get rid of the teacher. This isn’t an entirely silly suggestion, but again, it’s unlikely to be widely adopted.

The dialectic route tries, as in the Hegelian model, to overcome the limitations of the thesis and its antithesis by meshing the best from both. Here Scott gave two examples:

  • Language in The Wild. Used in Scandinavia. Students do classes but they’re sent out into the real world to do things like shopping.
  • The Hands Up Project.  Children who can’t get out of the classroom, such as children trapped in Gaza, are taught English by using technology to drive a communicative language learning approach.

The video of Nick in the UK interacting with some lovely kids in Gaza made a very uplifting ending to the talk.


I have two criticisms of Scott’s argument, one minor, one more important:

  1. The presentation of the two “intertwining but not interconnecting discourses” doesn’t do a good job of summarising differences between grammar-based ELT and a version of communicative language teaching that emphases interaction, student-centred learning, task-based activities, locally-produced materials, and communication for meaningful purposes.
  2. Scott’s framing of and solution to the problem of the grammar based syllabus is a cop out.

As to the first problem, Scott’s summary of the old and new, intertwined but not interconnected discourses has its limitations. The first three categories are not well-labelled, in my opinion. Language is not cognitive or social: the differences between grammatical and functional descriptions of language, or between cognitive and sociolinguistic approaches to SLA, are hardly well captured in this diagram.

Then, what are “units of acquisition”? How does the contrast between grammar Mcnuggets and communicative routines explain different conceptualisations of these “units”? What does “the nature of learning” refer to? What do “atomistic” and “holistic” mean here?  And while the fourth and fifth labels are clear enough, they’re false dichotomies; grammar-based teaching was and is concerned with promoting fluency, and communicative competence.

I think it would have been better to have used a framework like Breen’s (1984) to compare and contrast the syllabus types under scrutiny, asking of each one

  1. What knowledge does it focus on and prioritise?
  2. What capabilities does it focus on and prioritise?
  3. On what basis does it divide and sub-divide what is to be learned?
  4. How does it sequence what is to be learned?
  5. What is its rationale?

That way Scott could have looked at a grammar-based, or structural syllabus, a functional syllabus, like the one effectuated in Strategies, and a CLT syllabus as enacted in Dogme. That way, he could have dealt with the serious limitations of the Strategies approach and he could have dealt properly with his own approach. Which brings me to the more important criticism.

Face The Problem

The problem ELT faces is not “How do we resolve the tensions between two different discourses?”; rather it’s the problem which Scott clearly stated and then adroitly side-stepped on his way to a typically more anodyne, less controversial, resolution. The real problem is:

How can we combat the commodifying ideology of language teaching and learning which has turned teachers into ‘service providers’ who use coursebooks to deliver language instruction as if it were a standardised, marketable product?  

And the solution, of course, is radical change.

Decentralise. Organise teaching locally. Get rid of the coursebook. Reform the big testing authorities. Reform CELTA. Etc., etc..

Why did Scott side-step all these issues? Why, having clearly endorsed the findings of SLA research which show up the futility of a grammar based syllabus, and having shown how “really unhealthy” current ELT practice is, did Scott not argue the case for Dogme, or for Long’s version of TBLT, or for a learner-centred approach? Why did he not argue for reform of the current tests that dominate ELT, or of CELTA ?  Why did Scott dismiss his own approach, Dogme, as deserving no more than a footnote in Richards and Rogers, instead of promoting it as a viable alternative to the syllabus type that he so roundly, and rightly criticised?

Maybe, as he said, it was the end of the conference and he didn’t want to be gloomy. Or maybe it’s because he’s The Entertainer and that part of him got the better of the critical thinker and the reformer in him. If so, it’s a darn shame, however much fun it was to watch the performance.


Breen, M.P. (1984) Process syllabuses for the language classroom. In C.J.Brumfit (Ed.).  General English Syllabus Design. ELT Documents No. 118. London: Pergamon Press & The British Council. 47-60.

Lin, A. 2013. Toward paradigmatic change in TESOL methodologies: building plurilingual pedagogies from the ground up, TESOL Quarterly, 47/3.

Larsen Freeman’s IATEFL 2016 Plenary


In her plenary talk, Larsen Freeman argued that it’s time to replace “input-output metaphors” with “affordances”. The metaphors of input and output belong to a positivist, reductionist  approach to SLA which needs to be replaced by “a new way of understanding” language learning based on Complexity Theory.

Before we look at Larsen Freeman’s new way of understanding, let’s take a quick look at what she objects to by reviewing one current approach to understanding the process of SLA.

Interlanguage and related constructs 

There’s no single, complete and generally agreed-upon theory of SLA, but there’s a widespread view that second language learning is a process whereby learners gradually develop their own autonomous grammatical system with its own internal organising principles. This system is referred to as “interlanguage”.  Note that “interlanguage” is a theoretical construct (not a fact and not a metaphor) which has proved useful in developing a theory of some of the phenomena associated with SLA; the construct itself needs further study and the theory which it’s part of  is incomplete, and possibly false.

Support for the hypothesis of interlanguages comes from observations of U-shaped behaviour in SLA, which indicate that learners’ interlanguage development is not linear. An example of U-shaped behaviour is this:


The example here is from a study in the 70s. Another example comes from morphological development, specifically, the development of English irregular past forms, such as came, went, broke, which are supplanted by rule-governed, but deviant past forms: comed, goed, breaked. In time, these new forms are themselves replaced by the irregular forms that appeared in the initial stage.

This U-shaped learning curve is observed in learning the lexicon, too, as Long (2011) explains. Learners have to master the idiosyncratic nature of words, not just their canonical meaning. While learners encounter a word in a correct context, the word is not simply added to a static cognitive pile of vocabulary items. Instead, they experiment with the word, sometimes using it incorrectly, thus establishing where it works and where it doesn’t. The suggestion is that only by passing through a period of incorrectness, in which the lexicon is used in a variety of ways, can they climb back up the U-shaped curve. To add to the example of feet above, there’s the example of the noun shop. Learners may first encounter the word in a sentence such as “I bought a pastry at the coffee shop yesterday.” Then, they experiment with deviant utterances such as “I am going to the supermarket shop,” correctly associating the word ‘shop’ with a place they can purchase goods, but getting it wrong. By making these incorrect utterances, the learner distinguishes between what is appropriate, because “at each stage of the learning process, the learner outputs a corresponding hypothesis based on the evidence available so far” (Carlucci and Case, 2011).


The re-organisation of new information as learners move along the U-shaped curve is a characteristic of interlanguage development. Associated with this restructuring is the construct of automaticity. Language acquisition can be seen as a complex cognitive skill where, as your skill level in a domain increases, the amount of attention you need to perform generally decreases . The basis of processing approaches to SLA is that we have limited resources when it comes to processing information and so the more we can make the process automatic, the more processing capacity we free up for other work. Active attention requires more mental work, and thus, developing the skill of fluent language use involves making more and more of it automatic, so that no active attention is required. McLaughlin  (1987) compares learning a language to learning to drive a car. Through practice, language skills go  from a ‘controlled process’ in which great attention and conscious effort is needed to an ‘automatic process’.

Automaticity can be said to occur when associative connections between a certain kind of input and output pattern occurs. For instance, in this exchange:

  • Speaker 1: Morning.
  • Speaker 2: Morning. How are you?
  • Speaker 1: Fine, and you?
  • Speaker 2: Fine.

the speakers, in most situations, don’t actively think about what they’re saying. In the same way, second language learners’ learn new language through use of controlled processes, which become automatic, and in turn free up controlled processes which can then be directed to new forms.


There is a further hypothesis that is generally accepted among those working on processing models of SLA, namely that L2 learners pass through developmental sequences on their way to some degree of communicative competence, exhibiting common patterns and features across differences in learners’ age and L1, acquisition context, and instructional approach. Examples of such sequences are found in the well known series of morpheme studies; the four-stage sequence for ESL negation; the six-stage sequence for English relative clauses; and the sequence of question formation in German (see Long, 2015 for a full discussion).

Development of the L2 exhibits plateaus, occasional movement away from, not toward, the L2, and U-shaped or zigzag trajectories rather than smooth, linear contours. No matter what the learners’ L1 might be, no matter what the order or manner in which target-language structures are presented to them by teachers, learners analyze the input and come up with their own interim grammars, and they master the structures in roughly the same manner and order whether learning in classrooms, on the street, or both. This led Pienemann to formulate his learnability hypothesis and teachability hypothesis: what is processable by students at any time determines what is learnable, and, thereby, what is teachable (Pienemann, 1984, 1989).

All these bits and pieces of an incomplete theory of L2 learning suggest that learners themselves, not their teachers, have most control over their language development. As Long (2011) says:

Students do not – in fact, cannot – learn (as opposed to learn about) target forms and structures on demand, when and how a teacher or a coursebook decree that they should, but only when they are developmentally ready to do so. Instruction can facilitate development, but needs to be provided with respect for, and in harmony with, the learner’s powerful cognitive contribution to the acquisition process.

Let me emphasise that the aim of this psycholinguistic research is to understand how learners deal psychologically with linguistic data from the environment (input) in order to understand and transform the data into competence of the L2. Constructs such as input, intake, noticing, short and long term memory, implicit and explicit learning, interlanguage, output, and so on are used to facilitate the explanation, which takes the form of a number of hypotheses. No “black box” is used as an ad hoc device to rescue the hypotheses. Those who make use of Chomsky’s theoretical construct of an innate Language Acquisition Device in their theories of SLA do so in such a way that their hypotheses can be tested. In any case, it’s how learners interact psychologically with their linguistic environment that interests those involved in interlanguage studies. Other researchers look at how learners interact socially with their linguistic environment, and many theories contain both sociolinguistic and psycholinguistic components.

So there you are. There’s a quick summary of how some scholars try to explain the process of SLA from a psychological perspective. But before we go on, we have to look at the difference between metaphors and theoretical constructs.

Metaphors and Constructs

A metaphor is a figure of speech in which a word or phrase denoting one kind of object or idea is used in place of another to suggest a likeness or analogy between them. She’s a tiger. He died in a sea of grief. To say that “input” is a metaphor is to say that it represents something else, and so it does. To say that we should be careful not to mistake “input” for the real thing is well advised. But to say that “input” as used in the way I used it above is a metaphor is quite simply wrong. No scientific theory of anything uses metaphors because, as Gregg (2010) points out

There is no point in conducting the discussion at the level of metaphor; metaphors simply are not the sort of thing one argues over. Indeed, as Fodor and Pylyshyn (1988: 62, footnote 35) say, ‘metaphors … tend to be a license to take one’s claims as something less than serious hypotheses.’ Larsen-Freeman (2006: 590) reflects the same confusion of metaphor and hypothesis: ‘[M]ost researchers in [SLA] have operated with a “developmental ladder” metaphor (Fischer et al., 2003) and under certain assumptions and postulates that follow from it …’ But of course assumptions and postulates do not follow from metaphors; nothing does.

In contrast, theoretical constructs such as input, intake, noticing, automaticity, and so on, define what they stand for, and each of them is used in the service of exploring a hypothesis or a more general theory. All of the theoretical constructs named above, including “input”, are theory-laden: they’re terms used in a special way in the service of the hypothesis or theory they are part of,  and their validity or truth value can be tested by appeals to logic and empirical evidence. Some constructs, for example those used in Krashen’s theory, are found wanting because they’re so poorly-defined as to be circular. Other constructs, for example noticing, are the subject of both logical and empirical scrutiny. None of these constructs is correctly described as a metaphor, and Larsen Freeman’s inability to distinguish between a theoretical construct and a metaphor plagues her incoherent argument.  In short: metaphors are no grounds on which to build any theory, and dealing in metaphors assures that no good theory will result.

Get it? If you do, you’re a step ahead of Larsen Freeman, who seems to have taken several steps backwards since, in 1991, she co-authored, with Mike Long, the splendid An introduction to second language acquisition research.

Let’s now look at what Larsen Freeman said in her plenary address.

The Plenary

Larsen Freeman read this out:


Then, with this slide showing:


she said this:

Do we want to see our students as black boxes, as passive recipients of customised input, where they just sit passively and receive? Is that what we want?

Or is it better to see our learners as actively engaged in their own process of learning and discovering the world finding excitement in learning and working in a collaborative fashion with their classmates and teachers?

It’s time to shift metaphors. Let’s sanitise the language. Join with me; make a pledge never to use “input” and “output”.

You’d be hard put to come up with a more absurd straw man argument; a more trivial treatment of a serious issue. Nevertheless, that’s all Larsen Freeman had to say about it.


With input and output safely consigned to the dustbin of history, Larsen Freeman moved on to her own new way of understanding. She has a “theoretical commitment” to complexity theory, but, she said:

If you don’t want to take my word for it that ecology is a metaphor for now, .. or complexity theory is a theory in keeping with ecology, I refer you to your own Stephen Hawkins, who calls this century “the century of complexity.”

Well, if the great Stephen Hawkins calls this century “the century of complexity”, then  complexity theory must be right, right?

With Hawkins’ impressive endorsement in the bag, and with a video clip of a flock of birds avoiding a predator displayed on her presentation slide, Larsen Freeman began her account of the theory that she’s now so committed to.


She said:

Instead of thinking about reifying and classifying and reducing, let’s turn to the concept of emergence – a central theme in complexity theory. Emergence is the idea that in a complex system different components interact and give rise to another pattern at another level of complexity.

A flock of birds part when approached by a predator and then they re-group. A new level of complexity arises, emerges, out of the interaction of the parts.

All birds take off and land together. They stay together as a kind of superorganism. They take off, they separate, they land, as if one.

You see how that pattern emerges from the interaction of the parts?

Notice there’s no central authority: no bird says “Follow me I’ll lead you to safety”; they self organise into a new level of complexity.

What are the levels of complexity here? What is the new level of complexity that emerges out of the interaction of the parts? Where does the parting and reformation of the flock fit in to these levels of complexity? How is “all birds take off and land together” evidence of a new level of complexity?

What on earth is she talking about? Larsen Freeman constantly gives the impression that she thinks what she’s saying is really, really important, but what is she saying? It’s not that it’s too complicated, or too complex; it’s that it just doesn’t make much sense. “Beyond our ken”, perhaps.


The next bit of Larsen Freeman’s talk that addresses complexity theory was introduced by reading aloud this text:


After which she said:

Natural themes help to ground these concepts. …………….

I invite you to think with me and make some connections. Think about the connection between an open system and language. Language is changing all the time, its flowing but it’s also changing. ………………

Notice in this eddy, in this stream, that pattern exists in the flux, but all the particles that are passing through it are constantly changing.  It’s not the same water, but it’s the same pattern. ………………………..

So this world (the stream in the picture) exists because last winter there was snow in the mountains. And the snow pattern accumulated such that now when the snow melts, the water feeds into many streams, this one being one of them. And unless the stream is dammed, or the water ceases, the source ceases, the snow melts, this world will continue. English goes on, even though it’s not…. the English of Shakespeare and yet it still has the identity we know and call English. So these systems are interconnected both spatially and temporally, in time. 

Again, what is she talking about? What systems is she talking about? What does it all mean? The key seems to be “patterns in the flux”, but then, what’s so new about that?

At some point Larsen Freeman returned to this “patterns in the flux” issue. She showed a graph of the average performance of a group of students which indicated that the group, when seen as a whole, had made progress. Then she showed the graphs of the individuals who made up the group and it became clear that one or two individuals hadn’t made any progress. What do we learn from this? I thought she was going to say something about a reverse level of complexity, or granularity, or patterns disappearing from the flux from a lack of  interaction of the parts, or something.  But no. The point was:

When you look  at group average and individual performance, they’re different.

Just in case that’s too much for you to take in, Larsen Freeman explained:

Variability is ignored by statistical averages. You can make generalisations about the group but don’t assume they apply to individuals. Individual variability is the essence of adaptive behaviour. We have to look at patterns in the flux. That’s what we know from a complexity theory ecological perspective.


Returning to the exposition of complexity theory, there’s one more bit to add: adaptiveness. Larsen Freeman read aloud the text from this slide


The example is the adaptive immune system, not the innate immune system, the adaptive one. Larsen Freeman invited the audience to watch the video and see how the good microbe got the bad one, but I don’t know why. Anyway, the adaptive immune system is an example of a system that is nimble, dynamic, and has no centralised control, which is a key part of complexity theory.

And that’s all folks! That ‘s all Larsen Freeman had to say about complexity theory: it’s complex, open and adaptive. I’ve rarely witnessed such a poor attempt to explain anything.


Then Larsen Freeman talked about affordances. This, just to remind you, is her alternative to input.

There are two types of affordances

  1. Property affordances. These are in the environment. You can design an affordance. New affordances for classroom learning include providing opportunities for engagement; instruction and materials that make sure everybody learns; using technology.
  2. Second Order Affordances. These refer to the learner’s perception of and relation with affordances. Students are not passive receivers of input. Second order affordances Include the agent, the perceiver, in the system. Second order affordances are dynamic and adaptive; they emerge when aspects of the environment are in interaction with the agent. The agent’s relational stance to the property affordances is key. A learner’s perception of and interaction with the environment is what creates a second order affordance.

To help clarify things, Larsen Freeman read this to the audience:


(Note here that their students “operate between languages”, unlike mine and yours (unless you’ve already taken the pledge and signed up) who learn a second or foreign language. Note also that Thoms calls “affordance” a construct.)

If I’ve got it right, “affordances” refer first to anything in the environment that might help learners learn, and second to the learner’s relational stance to them. The important bit of affordances is the relational stance  bit: the learner’s perception of, and interaction with, the environment. Crucially, the learner’s perception of the affordance opportunities, has to be taken into account. “Really?” you might say, “That’s what we do in the old world of input too – we try to take into account the learner’s perception of the input!”

Implications for teaching

Finally Larsen Freeman addresses the implications of her radical new way of understanding for teaching.

Here’s an example. In the old world which Larsen Freeman is so eager to leave behind, where people still understand SLA in terms of input and output, teachers use recasts. In the shiny new world of complexity theory and emergentism, recasts become access-creating affordances.


Larsen Freeman explains that rather than just recast, you can “build on the mistake” and thus “manage the affordance created by it.”

And then there’s adaption.


Larsen Freeman refers to the “Inert Knowledge Problem”: students can’t use knowledge learned in class when they try to operate in the real world. How, Larsen Freeman asks, can they adapt their language resources to this new environment?  Here’s what she says:

So there’s a sense in which a system like that is not externally controlled through inputs and outputs but creates itself. It holds together in a self-organising manner – like the bird flock –  that makes it have its individuality and directiveness in relation to the environment.  Learning is not the taking in of existing forms but a continuing dynamic adaptation to context which is always changing  In order to use language patterns , beyond a given occasion, students need experience in adapting to multiple and variable contexts.

“A system like that”??  What system is she talking about? Well it doesn’t really matter, does it, because the whole thing is, once again, beyond our ken, well beyond mine, anyway.

Larsen Freeman gives a few practical suggestions to enhance our students’ ability to adapt, “to take their present system and mold (sic) it to a new context for a present purpose.”

You can do the same task in less time.

Don’t just repeat it, change the task a little bit.

Or make it easier.

Or give them a text to read.

Or slow down the recording.

Or use a Think Aloud technique in order to freeze the action, “so that you explain the choices that exist”. For example:

If I say “Can I help you?”, the student says:

“I want a book.”

and that might be an opportunity to pause and say:

“You can say that. That’s OK; I understand your meaning.”

But another way to say it is to say

“I would like a book.”

Right? To give information. Importantly, adaptation does not mean sameness, but we are trying to give information so that students can make informed choices about how they wish to be, um,… seemed.

And that was about it. I don’t think I’ve left any major content out.


This is the brave new world that two of the other plenary speakers – Richardson and Thornbury – want to be part of. Both of them join in Larsen Freeman’s rejection of the explanation of the process of SLA that I sketched at the start of this post, and both of them are enthusiastic supporters of Larsen Freeman’s version of complexity theory and emergentism.

Judge for yourself.      



Carlucci, L. and Case, J. (2013) On the Necessity of U-Shaped Learning.  Topics in Cognitive Science, 5. 1,. pp 56-88.

Gregg, K. R. (2010) Shallow draughts: Larsen-Freeman and Cameron on complexity. Second Language Research, 26(4) 549–56.

McLaughlin, B. (1987) Theories of Second Language Learning.  London: Edward Arnold.

Pienemann, M. (1987) Determining the influence of instruction on L2 speech processing. Australian Review of Applied Linguistics 10, 83-113.

Pienemann, M. (1989) Is language teachable? Psycholinguistic experiments and hypotheses. Applied Linguistics 10, 52-79.

Can we get a pineapple?


Lost and Unfounded

Leo Selivan’s and Hugh Dellar’s recent contributions to EFL Magazine give further evidence that their strident, confidently expressed ideas lack any proper theoretical foundations.

We can compare the cumulative attempts of Selivan and Dellar to articulate their versions of the lexical approach with the more successful attempts made by Richards and Long to articulate their approaches to ELT.  Richards (2006) describes what he calls “the current phase” of communicative language teaching as

a set of principles about the goals of language teaching, how learners learn a language, the kinds of classroom activities that best facilitate learning, and the roles of teachers and learners in the classroom ( Richards, 2006:2)

Note that Richards says this on page 2 of his book: he rightly starts out with the assumption that “a set of principles” is required.

Long (2015) offers his own version of task based language teaching and he goes to great lengths to explain the underpinnings of his approach. His book is, in my opinion, the best example in the literature of a well-founded, well-explained approach to ELT. It’s based on a splendidly lucid account of a cognitive-interactionist theory of instructed SLA, on careful definitions of task and needs analysis, and on 10 crystal clear methodological principles. Long’s book is to be recommended for its scholarship, its thoroughness, and, not least, for its commitment to a progressive approach to ELT.

So what do Selivan and Dellar offer?

In his “Beginners’ Guide To Teaching Lexically”, Selivan makes a number of exaggerated generalisations about English and then outlines “the main principles of the lexical approach”. These turn out to be

  1. Ban Single Words
  2. English word ≠ L1 word
  3. Explain less – explore more
  4. Pay attention to what students (think they) know.

To explain how such “principles” adequately capture the essence of the lexical approach, Sellivan offers “A bit of theory”for each one. For example, Selivan says “A new theory of language, known as Lexical Priming, lends further support to the Lexical Approach.  ……. By drawing students’ attention to collocations and common word patterns we can accelerate their priming”. Says he. But what reasons does he have for such confident assertions? Selivan fails to give his reasons, and fails to give any proper rationale for the claims he makes about language and teaching.

In his podcast, Dellar agrees that collocation is the driving force of English. He claims that the best way to conduct ELT is to concentrate on presenting and practising the lexical chunks needed for different communicative events. Teachers should get students to do things with these chunks such as “fill in gaps, discuss them, order them, say them, write them out themselves, etc.” with the goal of getting students to memorize them. Again, Dellar doesn’t explain why we should concentrate on these chunks, or why teachers should get students to  memorise them. Maybe he thinks “It stands to reason, yeah?”

At one point in his podcast Dellar says that, while those just starting to learn English will go into a shop and say “I want, um, coffee, um sandwich”,

…. as your language becomes more sophisticated, more developed, you learn to kind of grammar the basic content words that you’re adding thereSo you learn “Hi. Can I get a cup of coffee and a sandwich, please.” So you add the grammar to the words that drive the communication, yeah? Or you just learn that as whole chunk. You just learn “Hi. Can I get a cup of coffee? Can I get a sandwich, please?” Or you learn “Can I get…” and you drop in a variety of different things.

This is classic “Dellarspeak”: a badly-expressed misrepresentation of someone else’s erroneous theory.  Dellar doesn’t tell us how we teach learners “to grammar” content words, or when it’s better to teach “the whole chunk” – or what informs his use of nouns as verbs, for that matter. As for the “can I get…?” example, what’s wrong with just politely naming what we want:  Good MorningA coffee and a sandwich, please.”?  What is gained by teaching learners to use the redundant Can I get…. phrase?

But enough of Dellar’s hapless attempts to express other people’s ideas, let’s cut to the chase, if you get my drift. The question I want to briefly discuss is this:

Are Selivan’s and Dellar’s claims based on coherent theories of language and language learning, or are they mere opinions?


Models of English 

Crystal (2003) says: “an essential step in the study of a language is to model it”. Here are two models:

  1. A classic grammar model of the English language attempts to capture its structure, described in terms of grammar, the lexicon and phonology (see Quirk 1985, and Swan, 2001, for examples of descriptive and pedagogical grammars). This grammar model, widely used in ELT today, is rejected by Hoey.
  2. Hoey (2005) says that the best model of language structure is the word, along with its collocational and colligational properties. Collocation and “nesting” (words join with other primed words to form sequence) are linked to contexts and co-texts. So grammar is replaced by a network of chunks of words. There are no rules of grammar; there’s no English outside a description of the patterns we observe among those who use it. There is no right or wrong in language. It makes little sense to talk of something being ungrammatical (Hoey, 2005).

Selivan and Dellar uncritically accept Hoey’s radical new theory of language, but is it really better than the model suggested by grammarians?

Surely we need to describe language not just in terms of the performed but also in terms of the possible. Hoey’s argument that we should look only at attested behaviour and abandon descriptions of syntax strikes most of us as a step too far. And I think Selivan and Dellar agree, since they both routinely refer to the grammatical aspects of language. The problem is that Selivan and Dellar fail to give their own model of language, they fail to clearly indicate the limits of their adherence to Hoey’s model, they fail to say what place syntax has in their view of language. In brief, they have no coherent theory of language.

Hoey’s Lexical Priming Theory

Hoey (2005) claims that we learn languages by subconsciously noticing everything (sic) that we have ever heard or read about words, and storing it all in a massively repetitious way.

The process of subconsciously noticing is referred to as lexical priming. … Without realizing what we are doing, we all reproduce in our own speech and writing the language we have heard or read before. We use the words and phrases in the contexts in which we have heard them used, with the meanings we have subconsciously identified as belonging to them and employing the same grammar. The things we say are subconsciously influenced by what everyone has previously said to us.

This theory hinges on the construct of “subconscious noticing”, but instead of explaining it, Hoey simply asserts that language learning is the result of repeated exposure to patterns of text (the more the repetition the better the knowledge), thus adopting a crude version of behaviourism. Actually, several on-going quasi-behaviourist theories of SLA try to explain the SLA process (see, for example, MacWhinney, 2002; O’Grady, 2005; Ellis, 2006; Larsen-Freeman and Cameron, 2008), but Hoey pays them little heed, and neither do Selivan and Dellar, who swallow Hoey’s fishy tale hook line and sinker, take the problematic construct of priming at face value, and happily uses “L1 primings” to explain L1 transfer as if L1 primings were as real as the nose on Hoey’s face.

Hoey rejects cognitive theories of SLA which see second language learning as a process of interlanguage development, involving the successive restructuring of learners’ mental representation of the L2, because syntax plays an important role in them. He also rejects them because, contrary to his own theory, they assume that there are limitations in our ability to store and process information. In cognitive theories of SLA, a lot of research is dedicated to understanding how relatively scarce resources are used. Basically, linguistic skills are posited to slowly become automatic through participation in meaningful communication. While initial learning involves controlled processes requiring a lot of attention and time, with practice the linguistic skill requires less attention and less time, thus freeing up the controlled processes for application to new linguistic skills. To explain this process, the theory uses constructs such as comprehensible input, working and long term memory, implicit and explicit learning, noticing, intake and output.

In contrast, Hoey’s theory concentrates almost exclusively on input, passing quickly over the rest of the issues, and simply asserts that we remember the stuff that we’ve most frequently encountered. So we must ask Selivan and Dellar: What theory of SLA informs your claims? As an example, we may note that Long (2015) explains how his particular task-based approach to ELT is based on a cognitive theory of SLA and on the results of more than 100 studies.

Hoey’s theory doesn’t explain how L2 learners process and retrieve their knowledge of L2 words, or how paying attention to lexical chunks or “L1 primings” affects the SLA process. So what makes Selivan and Dellar think that getting students to consciously notice both lexical chunks and “L1 primings” will speed up primings in the L2? Priming, after all, is a subconscious affair. And what makes Dellar think that memorising lexical chunks is a good way to learn a second language? Common sense? A surface reading of cherry-picked bits of contradictory theories of SLA? Personal experience? Anecdotal evidence? What? There’s no proper theoretical base for any of Dellar’s claims; there’s scarce evidence to support them; and there’s a powerful theory supported by lots of evidence which suggests that they’re mistaken.


 All Chunks and no Pineapple 

Skehan (1998) says:

Phrasebook-type learning without the acquisition of syntax is ultimately impoverished: all chunks but no pineapple. It makes sense, then, for learners to keep their options open and to move between the two systems and not to develop one at the expense of the other. The need is to create a balance between rule-based performance and memory-based performance, in such a way that the latter does not predominate over the former and cause fossilization.

If Selivan and Dellar agree that there’s a need for a balance between rule-based performance and memory-based performance, then they have to accept that Hoey is wrong, and confront the contradictions that plague their present position on the lexical approach, especially their reliance on Hoey’s description of language and on the construct of priming. Until Selivan and Dellar sort themselves out, until they tackle basic questions about a model of English and a theory of second language learning, so as to offer some principled foundation for their lexical approach, then it amounts to little more than an opinion, more precisely: the unappetising opinion that ELT should give priority to helping learners memorise pre-selected lists of lexical chunks. 


Crystal, D. (2003) The English Language. Cambridge: Cambridge University Press.

Ellis, N. C. (2006) Language acquisition and rational contingency learning. Applied Linguistics, 27 (1), 1-24.

Hoey, M. (2005) Lexical Priming: A New Theory of Words and Language. Psychology Press.

Krashen, S. (1985) The Input Hypothesis: Issues and Implications. Longman.

Larsen-Freeman, D and Cameron, L. (2008) Complex Systems and Applied Linguistics. Oxford, Oxford University Press.

Lewis, M. (1993) The Lexical Approach. Language Teaching Publications.

Lewis, M. (1996) Implications of a lexical view of language’. In Willis, J,, & Willis, D. (eds.) Challenge and Change in Language Teaching, pp. 4-9. Heinemann.

Lewis, M. (1997) Implementing the Lexical Approach. Language Teaching Publications.

Long, M. (2015) Second Language Acquisition and Task-Based Language Teaching. Wiley.

MacWhinney, B. (2002) The Competition Model: the Input, the Context, and the Brain. Carnegie Mellon University.

O’Grady, W. (2005) How Children Learn Language Cambridge, Cambridge Universiy Press.

Richards, J (2006) Communicative Language Teaching Today. Cambridge University Press.

Quirk, R., Greenbaum, S., Leech, G. and Svartvik, J. (1985) A Comprehensive Grammar of the English Language, London: Longman.

Skehan, P. (1998) A Cognitive Approach to Language Learning. Oxford: Oxford University Press.

Swan, M. (2001) Practical English usage. Oxford: Oxford University Press.

A Final Tilt at the Windmill of Thornbury’s A to Z


They keep coming, like burps after a poorly-digested Christmas lunch: comments on Thornbury’ A to Z blog. I’ve read 3 in the last few days, so let me add my own final swipe at the edifice before 2015 concludes.

Thornbury’s Sunday posts on his A to Z blog only lasted a few months, but during that short season they became part of my Sunday morning routine: late breakfast, read Thornbury, join in the discussions that aways followed. The final Sunday post was The Poverty of the Stimulus, and as usual it had enough good stuff in it to spark off an interesting discussion. On this particular Sunday  I made a few contributions and the exchange went something like this:

Initial statement from Scott (I use his first name to emphasise the cosy Sunday morning feel of the discussion, and also as a way of reminding myself to be nice.)

  1. The quantity and quality of language input that children get is so great as to question Chomsky’s poverty of the stimulus argument.
  2. An alternative to Chomsky’s view of language and language learning, is that “language is acquired, stored and used as meaningful constructions (or ‘syntax-semantics mappings’).”
  3. Everett (he of “There is no such thing as UG” is right to point out that since no one has proved that the poverty of the stimulus argument is correct, “talk of a universal grammar or language instinct is no more than speculation”.


My first reply is short:

“Everett’s claim is nonsense since it’s logically impossible to prove that a theory is true.”

Scott ignores this comment and prefers to pay attention to a certain Svetlana (I imagine her sitting in a wifi-equipped tent, huddled over an Apple app projecting a 3-D crystal ball) who tells him that he’s right to question the POS claim because tiny babies, only recently emerged (sic) from the womb, form huge numbers, like, well millions, of neural connections per second and what’s more, they rapidly develop dendritic spines containing “lifelong memories”.  A few unsupported pseudo-scientific, quasi-philosophical assertions which sound as if they’ve been picked up from a hazy weekend seminar at the Sorbonnne are thrown in for good measure.

Imagine my surprise when Scott thanks the mystic Svetlana for bringing “new evidence to bear”, and says that this evidence serves to confirm his “initial hunch.”

“WHAT??” I typed furiously. “Are you really going to be hoodwinked by such postmodernist, obscurantist mumbo jumbo?” (There’s not much known for sure about the role dendritic spines play in learning and memory; I suspect she thinks that mentioning them here is evidence of deep knowledge of the scientific study of the nervous system; and suggesting that they disprove the POS argument is fanciful nonsense.)

“Give us an example of a lifelong memory stored in a dendritic spine that ’s relevant to this discussion then!” I shout uselessly at the monitor.

Well, Scott’s not just hoodwinked, he actually becomes emboldened. Spurred on by the compelling “new evidence”, he’s now ready to dismiss the POS argument completely.

“Actually”, he says, the stimulus is quite enough to explain everything children know about language. Corpus studies “suggest that everything a child needs is in place”.

Asked how these corpus studies explain what children know about language, Scott (apparantly still intoxicated by Svetlana’s absurd revelations) says “the child’s brain is mightily disposed to mine the input”, adding, as if this were the clincher, “a little stimulus goes a long way, especially when the child is so feverishly in need of both communicating and becoming socialized.”

“Cripes! His brain’s gone soft!” I thought. “He’s barking mad!”

“Platitudes and unsupported assertions have now completely replaced any attempt at reasoned argument”, I wrote.

“Anyone who claims that children’s knowledge about an aspect of syntax could not have been acquired from language input has to prove that it couldn’t. Otherwise it remains another empirically-empty assertion” says Scott.

Dear oh dear, here we are back at the start. As with the Everett quote, for purely formal reasons, it’s not possible to prove such a thing, and to demand such “proof” demonstrates an ignorance of logic and of how rational argument, science, and theory construction work. Failing to meet the impossible demand of proof doesn’t make the POS argument an empirically-empty assertion.

Then Russ Mayne joins in to have his typically badly-informed little say. Chomsky, he tells us, is “utterly scornful of data.”

“No he’s not”, says I, ““Chomsky’s theory of UG has a long and thorough history of empirical research.”

And blow me down if Thornbury doesn’t chime in:

““Chomsky’s theory of UG has a long and thorough history of empirical research”. What!!? Where? When? Who?”

So now he’s not just showing a prediliction for explanations involving the lifelong memories stored in dendritic spines, he’s showing even worse signs of ignorance.


That the discussion of the POS argument didn’t get satisfactorily resolved is hardly surprising, but I was more than a bit surprised to hear Scott telling us that language learning can be satisfactorily explained by the general learning processes going on inside feverish young brains that are “mightily disposed to mine the input”. (Just in passing, all these references to the child’s brain seem to contradict the part of the current Thornbury canon which deals with “the language body”.) Asked to say a bit more about how language learning can be done through general learning processes and input alone, Thornbury says

“If we generalize the findings beyond the single word level to constructions…” and then “… generalize from constructions to grammar…”,  “hey presto, the grammar emerges on the back of the frequent constructions.”

Hey presto? What grammar? What “findings beyond the single word level”? How do you generalise these findings to “constructions” And how do you generalise from constructions to “grammar”?

This unwarranted dismissal of the POS argument, coupled with its incoherent account of language learning is, you might think, excusable in a Sunday morning chat, but we find more evidence of both the ignorance and the incoherence displayed here in more carefully-prepared public pronouncements on the same subjects. Thornbury’s very poor attempts to challenge Chomsky and psychological approaches to SLA by offering a particularly lame and simplistic version of emergentism, mostly based on Larsen-Freeman’s recent work have already been commented on in this blog (see for example Thornbury and the Learning Body and Emergentism 2), but let me say just a bit more.

Thornbury and Emergentism

Thornbury keeps telling people about Larsen-Freeman’s latest project. The best criticism I’ve read of it is the 2010 article by Kevin Gregg in SLR entitled “Shallow draughts: Larsen-Freeman and Cameron on complexity.” There’s no way I can do justice to the article by quickly summarising it, and I urge readers of this post to read Gregg’s article for themselves. As always with Gregg, the argument is not just devastating, but delightfully written. Gregg dismantles the pretences of the Larsen-Freeman and Cameron book and shows that all their appeals to complexity theory are so much hogwash; nothing of substance sustains the fanciful opinions of the authors. And likewise, Thornbury.

Thornbury has said nothing to persuade any intelligent reader that his version of emergentism provides a good explanation of SLA. Just a few points:

  • Emergentism rests on empiricism and empiricism pure and simple is a bankrupt epistemology.
  • Emergentism doesn’t get the support Thornbury claims it gets from the study of corpora – how could it? Thornbury’s claims show an ignorance of both theory construction and scientific method.
  • As Gregg (2010) points out, the claim that language is a complex dynamical system makes no sense. “Simply put, there is no such entity as language such that it could be a system, dynamical or otherwise……. Terms like ‘language’ and ‘English’ are abstractions; abstract terms, like metaphors, are essential for normal communication and expression of ideas, but that does not mean they refer to actual entities. English speakers exist, and (I think) English grammars come to exist in the minds/brains of those speakers, so it remains within the realm of possibility that a set of speakers is a dynamical system, or that the acquisition process is; but not language, and not a language.”
  • Thornbury’s assertion that language learning can be explained as the detection and memorisation of “frequently-occurring sequences in the sensory data we are exposed to” is an opinion masquerading as an explanatory theory. How can general conceptual representations acting on stimuli from the environment explain the representational system of language that children demonstrate?  Thornbury’s  suggestion that we have an innate capacity to “unpack the regularities within lexical chunks, and to use these patterns as templates for the later development of a more systematic grammar” begs more questions than it answers and, anyway, contradicts the empiricist epistemology adopted by most emergentists who say that there aren’t, indeed can’t be, any such things as innate capacities.

NOTE: I’ve added 2 appendices to deal with the 2 questions asked by Patrick Amon.

Appendix 1: Why can’t you prove that a general causal theory is true?

The problem of induction

Hume (1748) started from the premise that only “experience” (by which Hume meant that which we perceive through our senses) can help us to judge the truth or falsity of factual sentences. Thus, if we want to understand something, we must observe the relevant quantitative, measurable data in a dispassionate way. But if knowledge rests entirely on observation, then there is no basis for our belief in natural laws because it is an unwarranted inductive inference. We cannot logically go from the particular to the general: no amount of cumulative instances can justify a generalisation; ergo no general law or generalised causal explanation is true. No matter how many times the sun rises in the East, or thunder follows lightening, or swans appear white, we will never know that the sun rises in the East, or that thunder follows lightning or that all swans are white. This is the famous “logical problem of induction”. Why, nevertheless, do all reasonable people expect, and believe that instances of which they have no experience will conform to those of which they have experience?” Hume’s answer is: ‘Because of custom or habit’. (Popper, 1979: 4)More devastating still was Hume’s answer to Descartes’ original question “How can I know whether my perceptions of the world accurately reflect reality?” Hume’s answer was “You can’t.”

It is a question of fact whether the perceptions of the senses be produced by external objects resembling them: how shall this question be determined? By experience surely; as all questions of a like nature.  But here experience is, and must be, entirely silent.  The mind has never anything present to it but the perceptions, and cannot possibly reach any experience of their connection with objects.  The supposition of such a connection is, therefore, without any foundation in reasoning. (Hume, 1988 [1748]: 253)

Thus, said Hume, Descartes was right to doubt his experiences, but, alas, experiences are all we have.

The asymmetry between truth and falsehood.

Popper (1972) offers a way out of Hume’s dilemma. He concedes that Hume is right: there is no logical way of going from the particular to the general, and that is that: however probable a theory might claim to be, it can never be claimed to be true.

Popper (1959, 1963, 1972) argued that the root of the problem of induction was the concern with certainty. In Popper’s opinion Descartes’ quest was misguided and had led to three hundred years of skewed debate.  Popper claimed that the debate between the rationalists and the empiricists, with the idealists pitching in on either side, had led everybody on a wild goose chase – the elusive wild goose being “Truth”.  From an interest in the status of human knowledge, philosophers and philosophers of science had asked which, if any, of our beliefs can be justified.  The quest was for certainty, to vanquish doubt, and to impose reason.  Popper suggested that rather than look for certainty, we should look for answers to problems, answers that stand up to rational scrutiny and empirical tests.

Popper insists that in scientific investigation we start with problems, not with empirical observations, and that we then leap to a solution of the problem we have identified – in any way we like. This second anarchic stage is crucial to an understanding of Popper’s epistemology: when we are at the stage of coming up with explanations, with theories or hypotheses, then, in a very real sense, anything goes.  Inspiration can come from lowering yourself into a bath of water, being hit on the head by an apple, or by imbibing narcotics.  It is at the next stage, the stage of the theory-building process, that empirical observation comes in, and, according to Popper, its role is not to provide data that confirm the theory, but rather to find data that test it.

Empirical observations should be carried out in attempts to falsify the theory: we should search high and low for a non-white swan, for an example of the sun rising in the West, etc. The implication is that, at this crucial stage in theory construction, the theory has to be formulated in such a way as to allow for empirical tests to be carried out: there must be, at least in principle, some empirical observation that could clash with the explanations and predictions that the theory offers.  If the theory survives repeated attempts to falsify it, then we can hold on to it tentatively, but we will never know for certain that it is true.  The bolder the theory (i.e. the more it exposes itself to testing, the more wide-ranging its consequences, the riskier it is) the better.  If the theory does not stand up to the tests, if it is falsified, then we need to re-define the problem, come up with an improved solution, a better theory, and then test it again to see if it stands up to empirical tests more successfully.  These successive cycles are an indication of the growth of knowledge.

Popper (1974: 105-106) gives the following diagram to explain his view:

P1 -> TT -> EE -> P2

P = problem   TT = tentative theory  EE = Error Elimination (empirical experiments to test the theory)

We begin with a problem (P1), which we should articulate as well as possible. We then propose a tentative theory (TT), that tries to explain the problem. We can arrive at this theory in any way we choose, but we must formulate it in such a way that it leaves itself open to empirical tests.  The empirical tests and experiments (EE) that we devise for the theory have the aim of trying to falsify it.  These experiments usually generate further problems (P2) because they contradict other experimental findings, or they clash with the theory’s predictions, or they cause us to widen our questions.  The new problems give rise to a new tentative theory and the need for more empirical testing.

Popper thus gives empirical experiments and observation a completely different role: their job now is to test a theory, not to prove it, and since this is a deductive approach it escapes the problem of induction. Popper takes advantage of the asymmetry between verification and falsification: while no number of empirical observations can ever prove a theory is true, just one such observation can prove that it is false.  All you need is to find one black swan and the theory “All swans are white” is disproved.

Appendix 2: Empiricism and epistemology

Moving to Patrick’s second question, I meant to say that “pure” or “extreme” forms of empiricsm are now generally rejected. Those who adopt a relativist epistemology (e.g most post-modernists) and those who are ignorant of the philosophy of science (e.g. Thornbury) wrongly label their opponents (rationalists who base their arguments on logic and empirical observation) as “positivists”. In fact, nobody in the scientific community is a positivist these days. The last wave of positivists belonged to the famous Vienna Circle. The objective of the members of the Vienna Circle was to continue the work of their predecessors (most importantly Comte and Mach) by giving empiricism a more rigorous formulation through the use of recent developments in mathematics and logic. The Vienna circle, which comprised Schlick, Carnap, Godel, and others, and had Russell, Whitehead and Wittgenstein as interested parties (see Hacking, 1983: 42-44), developed a programme labelled Logical Positivism, which consisted first of cleaning up language so as to get rid of paradoxes , and then limiting science to strictly empirical statements: in the grand tradition of positivism they pledged to get rid of all speculations on “pseudo problems” and concentrate exclusively on empirical data.   Ideas were to be seen as “designations”, terms or concepts, that were formulated in words that needed to be carefully defined in order that they be meaningful, rather than meaningless. The logical positivists are particularly well-known for their attempt to answer Hume’s criticism of induction through Probability Theory, which, crudely, proposed that while a finite number of confirming instances of a theory could not prove it, the more numerous the confirming instances, the more probability there was that the theory was true. This, like just about all of their work, ended in failure.

Empiricism in Linguistics: Behaviourism  

But empiricism lived on, and in linguistics, the division between “empiricist” and “rationalist” camps is noteworthy. The empiricists, who held sway, at least in the USA, until the 1950s, and whose most influential member was Bloomfield, saw their job as field work: accompanied with tape recorders and notebooks the researcher recorded thousands of hours of actual speech in a variety of situations and collected samples of written text. The data was then analysed in order to identify the linguistic patterns of a particular speech community.  The emphasis was very much on description and classification, and on highlighting the differences between languages.  We might call this the botanical approach, and its essentially descriptive, static, “naming of parts” methodology depended for its theoretical underpinnings on the “explanation” of how we acquire language provided by the behaviourists.

Behaviourism was first developed in the early twentieth century by the American psychologist John B. Watson, who, influenced by the work of Pavlov and Bekhterev on conditioning of animals, attempted to make psychological research “scientific” by using only objective procedures, such as laboratory experiments which were designed to establish statistically significant results. Watson formulated a stimulus-response theory of psychology according to which all complex forms of behaviour are explained in terms of simple muscular and glandular elements that can be observed and measured.  No mental “reasoning”, no speculation about the workings of any “mind”, were allowed. Thousands of researchers adopted this methodology, and from the end of the first world war until the 1950s an enormous amount of research on learning in animals and in humans was conducted under this strict empiricist regime.  In 1950 behaviourism could justly claim to have achieved paradigm status, and at that moment B.F. Skinner became its new champion.  Skinner’s contribution to behaviourism was to challenge the stimulus-response idea at the heart of Watson’s work and replace it by a type of psychological conditioning known as reinforcement.  Important as this modification was, it is Skinner’s insistence on a strict empiricist epistemology, and his claim that language is learned in just the same way as any other complex skill is learned, by social interaction, that is important here.

In sharp contrast to the behaviourists and their rejection of “mentalistic” formulations is the rationalist approach to linguistics championed by Chomsky. Chomsky (in 1959 and subsequently) argued that it is the similarities among languages, what they have in common, that is important, not their differences. In order to study these similarities we must allow the existence of unobservable mental structures and propose a theory of the acquisition of a certain type of knowledge.

Well, you know the story: Chomsky’s theory was widely adopted and became the new paradigm. Currently, badly-informed people like Larsen-Freeman and Thornbury (as opposed to serious scholars like O’Grady, MacWhinney and others) are claiming that no appeals to innate, unobservable mental processes or to modules of mind are necessary to explain language learning. What they don’t appreciate is that, unless, like William O’Grady or Brian MacWhinney, they deal properly with epistemological questions about the status of psychological processes, mental states, mind versus brain, and so on, they are either trying to have their cake and eat it or adopting an untenable empiricist epistemology.



Gregg, K.R. (2010) Shallow draughts: Larsen-Freeman and Cameron on complexity. Second Language Research, 26(4), 549 – 560.

Hacking, I. (1983) Representing and Intervening. Cambridge: Cambridge University Press.

Hume, D. (1988) [1748]: An Enquiry Concerning Human Understanding. Amherst, N.Y.  Promethius.

Popper, K. R. (1959) The Logic of Scientific Discovery. London: Hutchinson.

Popper, K. R. (1963) Conjectures and Refutations. London: Hutchinson.

Popper, K. R. (1972) Objective Knowledge. Oxford: Oxford University Press.

Popper, K. (1974) Replies to Critics in P.A. Schilpp (ed.), The Philosophy of Karl Popper. Open Court, La Salle, III.

Thornbury, S. (2013) ‘The learning body,’ in Arnold, J. & Murphey, T. (eds.) Meaningful Action: Earl Stevick’s influence on language teaching. Cambridge. Cambridge University Press.

Thornbury, S. (2012?) Language as an emergent system. British Council, Portugal: In English. Available here:



Dellar and Lexical Priming


In a recent webinar (which I read about in a post by Leo Selivan) Hugh Dellar talked about colligation. I missed the webinar and I found Selivan’s report of it confusing, so I took a look at the slides Dellar used.  Early on in his presentation, Dellar quotes Hoey (2005, p.43)

The basic idea of colligation is that just as a lexical item may be primed to co-occur with another lexical item, so also it may be primed to occur in or with a particular grammatical function. Alternatively, it may be primed to avoid appearance in or co-occurrence with a particular grammatical function. 

I don’t know how Dellar explained Hoey’s use of the term “primed” in his webinar, but I understand priming to be based on the idea that each word we learn becomes associated with the contexts with which we repeatedly encounter it, so much so that we subconsciously expect and replicate these contexts when we hear and speak the words. The different types of information that the word is associated with are called its primings.

What does Hoey himself say? Hoey says that we get all our knowledge about words (their collocations, colligations, and so on) by subconsciously noticing everything that we have ever heard or read, and storing it in memory.

The process of subconsciously noticing is referred to as lexical priming. … Without realizing what we are doing, we all reproduce in our own speech and writing the language we have heard or read before. We use the words and phrases in the contexts in which we have heard them used, with the meanings we have subconsciously identified as belonging to them and employing the same grammar. The things we say are subconsciously influenced by what everyone has previously said to us (Hoey, 2009 – Lexical Priming)  

Hoey rejects Chomsky’s view of L1 acquisition and claims that children learn language starting from a blank slate and then building knowledge from subconsciously noticed connections between lexical items. All language learning (child L1 and adult SLA alike) is the result of repeated exposure to patterns of text, where the more the repetition, the more chance for subconscious noticing, and the better our knowledge of the language.

The weaknesses of this theory include the following:

  • Hoey does not explain the key construct of subconscious noticing;
  • he does not explain how the hundreds of thousands of patterns of words acquired through repeatedly encountering and using them are stored and retrieved;
  • he does not acknowledge any limitations in our ability to remember, process or retrieve this massive amount of linguistic information;
  • he does not reply to the argument that we can and do say things that we haven’t been trained to say and that we have never heard anybody else say, which contradicts the claim that what we say is determined by our history of priming.
  • while Hoey endorses Krashen’s explanation of SLA (it’s an unconscious process dependent on comprehensible input), Krashen’s Natural Order Hypothesis contradicts Hoey’s lexical priming theory, since, while the first claims that SLA involves the acquisition of grammatical structures in a predictable sequence, the second claims that grammatical structures are lexical patterns and that there is no order of acquisition.

These limitations in Hoey’s theory get no mention from Dellar, who, having previously modelled his lexical approach on Michael Lewis, now seems to have fully embraced Hoey’s lexical priming theory. Let’s look at how this theory compares to rival explanation. (I’m here making use of material I’ve used in previous posts about Dellar & Hoey.)

Interlanguage Grammar versus Lexical Priming

In the last 40 years, great progress has been made in developing a theory of SLA based on a cognitive view of learning. It started in 1972 with the publication of Selinker’s paper where he argues that the L2 learners have their own autonomous mental grammar which came to be known as interlanguage grammar, a grammatical system with its own internal organising principles, which may or may not be related to the L1 and the L2.

One of the first stages of this interlanguage to be identified was that for ESL questions. In a study of six Spanish students over a 10-month period, Cazden, Cancino, Rosansky and Schumann (1975) found that the subjects produced interrogative forms in a predictable sequence:

  1. Rising intonation (e.g., He works today?),
  2. Uninverted WH (e.g., What he (is) saying?),
  3. “Overinversion” (e.g., Do you know where is it?),
  4. Differentiation (e.g., Does she like where she lives?).

A later example is in Larsen-Freeman and Long (1991: 94). They pointed to research which suggested that learners from a variety of different L1 backgrounds go through the same four stages in acquiring English negation:

  1. External (e.g., No this one./No you playing here),
  2. Internal, pre-verbal (e.g., Juana no/don’t have job),
  3. Auxiliary + negative (e.g., I can’t play the guitar),
  4. Analysed don’t (e.g., She doesn’t drink alcohol.)

In developing a cognitive theory of SLA, the construct of interlanguage became central to the view of L2 learning as a process by which linguistic skills become automatic. Initial learning requires controlled processes, which require attention and time; with practice the linguistic skill requires less attention and becomes routinized, thus freeing up the controlled processes for application to new linguistic skills. SLA is thus seen as a process by which attention-demanding controlled processes become more automatic through practice, a process that results in the restructuring of the existing mental representation, the interlanguage.

So there are two rival theories of SLA on offer here: Hoey’s theory of lexical priming (supported by Dellar, Selivan and others) and Selinker’s theory of interlanguage (developed by Long, Robinson, Schmidt, Skehan, Pienemann and others). Dellar should resist giving the impression that Hoey’s theory is the definitive and unchallenged explanation of how we learn languages.

Errors and L1 priming

in his presentation Dellar says “All our students bring L1 primings” and gives these examples from Polish.

On chce zebym studiowal prawo.

Zimno mi.

Jak ona wyglada?

These L1 primings “colour L2”

He wants that I study Law.

It is cold to me.

How does she look?

Dellar says that these are not grammar errors, but rather “micro-grammatical problems” caused by a lack of awareness of how the words attach themselves to grammar. The solution Dellar offers to these problems is to provide learners with lots of examples of “correct colligation and co –text”.

He wants me to study Law.

My dad’s quite pushy. He wants me to study Business, but I’m not really sure that I want to.

It’s really cold today.  It’s freezing!  I’m freezing!

What does she look like?  Oh, she’s quite tall . . . long hair . . . quite good-looking, actually. Well, I think so anyway.

This kind of correction is, says Dellar, “hard work, but necessary work”. It ensures that “students are made aware of how the way they think the language works differs from how it really works.” Dellar concludes that

 Hoey has shown the real route to proficiency is sufficient exposure. Teachers can shortcut the priming process by providing high-reward input that condenses experience and saves time.

We may note how Hoey, not Krashen, gets the credit for showing that the real route to proficiency is sufficient exposure; how priming now explains learning; and how teaching must now concentrate on providing shortcuts to the primimg process.

To return to Dellar’s “micro-grammatical problems”, we are surely entitled to ask if what SLA researchers for 50 years have referred to as the phenomenon of L1 transfer is better understood as the phenomenon of L1 primings. Recall that Pit Corder argued in 1967 that learner errors were neither random nor best explained in terms of the learner’s L1; errors were indications of learners’ attempts to figure out an underlying rule-governed system.  Corder distinguished between errors and mistakes: mistakes are slips of the tongue and not systematic, whereas errors are indications of an as yet non-native-like, but nevertheless, systematic, rule-based grammar.  Dulay and Burt (1975) then claimed that fewer than 5% of errors were due to native language interference, and that errors were, as Corder suggested, in some sense systematic.  The morpheme studies of Brown in L1 (1973) led to studies in L2 which suggested that there was a natural order in the acquisition of English morphemes, regardless of L1.  This became known as the L1 = L2 Hypothesis, and further studies all pointed to systematic staged development in SLA.  The emerging cognitive paradigm of language learning perhaps received its full expression in Selinker’s (1972) paper which argues that the L2 learners have their own autonomous mental grammar (which came to be known, pace Selinker, as interlanguage (IL) grammar), a grammatical system with its own internal organising principles, which may or may not be related to the L1 and the L2.

All of this is contradicted by Dellar, who insists that L1 priming explains learner errors. 

Language development through L2 priming  versus processing models of SLA

Explaining L2 development as a matter of strengthening L2 primings between words contradicts the work of those using a processing model of SLA, and I’ll give just one example. McLaughlin (1990) uses the twin concepts of “Automaticity” and “Restructuring” to describe the cognitive processes involved in SLA. Automaticity occurs when an associative connection between a certain kind of input and some output pattern occurs.   Many typical greetings exchanges illustrate this:

Speaker 1: Morning.

Speaker 2: Morning. How are you?

Speaker 1: Fine, and you?

Speaker 2: Fine.

Since humans have a limited capacity for processing information, automatic routines free up more time for such processing. To process information one has to attend to, deal with, and organise new information.  The more information that can be handled routinely, automatically, the more attentional resources are freed up for new information.  Learning takes place by the transfer of information to long-term memory and is regulated by controlled processes which lay down the stepping stones for automatic processing.

The second concept, restructuring, refers to qualitative changes in the learner’s interlanguage as they move from stage to stage, not to the simple addition of new structural elements. These restructuring changes are, according to McLaughlin, often reflected in “U-shaped behaviour”, which refers to three stages of linguistic use:

  • Stage 1: correct utterance,
  • Stage 2: deviant utterance,
  • Stage 3: correct target-like usage.

In a study of French L1 speakers learning English, Lightbown (1983) found that, when acquiring the English “ing” form, her subjects passed through the three stages of U-shaped behaviour.  Lightbown argued that as the learners, who initially were only presented with the present progressive, took on new information – the present simple – they had to adjust their ideas about the “ing” form.  For a while they were confused and the use of “ing” became less frequent and less correct.

According to Dellar (folowing Hoey) this “restructuring” explanation is wrong: what’s actually happening is that the L2 primings are not getting enough support from “high-reward input”.


There are serious weaknesses in the lexical priming theory as a theory of SLA, and few reasons to think that it offers a better explanation of the phenomena studied by SLA scholars, including the phenomenon of L1 transfer, than processing theories which use the construct of interlanguage grammar. Even if there were, Dellar seems not to have grasped that his newly-adopted explanation of language learning and his long-established teaching methods contradict each other. If lexical priming is a subconcious process which explains language learning, then the sufficient condition for learning is exposure to language and opportunities to strengthen and extend lexical primings. All the corrective work that Dellar recommends, all that “hard but necessary work” to ensure that “students are made aware of how the way they think the language works differs from how it really works” is useless interference in a natural process involving the unconscious acquisition of lexical knowledge.



Cazden, C., Cancino, E., Rosansky, E. and Schumann, J. (1975) Second language acquisition sequences in children, adolescents and adults. Final report submitted to the National Institute of Education, Washington, D.C.

Corder, S. P. (1967) The significance of learners’ errors. International Review of Applied Linguistics 5, 161-9.

Dulay, H. and Burt, M. (1975) Creative construction in second language learning and teaching. In Burt, M and Dulay, H. (eds.), New directions in second language learning, teaching, and bilingual education. Washington, DC: TESOL, 21-32.

Hoey, M. (2005) Lexical Priming: A New Theory of Words and Language. London: Routledge.

Krashen, S. (1981) Second language acquisition and second language learning. Oxford: Pergamon.

Larsen-Freeman, D. and Long, M. H. (1991) An introduction to second language acquisition research. Harlow: Longman.

McLaughlin, B. (1990) “Conscious” versus “unconscious” learning. TESOL Quarterly 24, 617-634.

Selinker, L. (1972) Interlanguage.  International Review of Applied Linguistics 10, 209-231.

Thornbury and The Learning Body


Scott Thornbury has been talking about “The Learning Body” for a while now. You can see one version on YouTube and you can see another version at the ELTABB website. You can also read a fuller treatment in Thornbury’s chapter in the tribute to Earl Stevick: Meaningful Action   (Just BTW, it’s not a great collection.)  I base this critique on the YouTube talk.

Summary of the Talk

Thornbury starts by asserting that “Descartes got it wrong”. There is no mind/body dualism, rather “Brains are in bodies, bodies are in the world and meaningful action in these worlds is socially constructed and conducted” (Churchill et al, 2010). This devastating rebuttal of Descartes, which Thornbury (ignoring works by Locke, Hume, Derrida and others) reckons was “finally revealed” in 1994, has been ignored by those responsible for the prevailing orthodoxy in SLA, who insist that “language and language learning are a purely cognitive phenomenon.” Thornbury tells us that this orthodoxy claims that we need look no further than cognition for an explanation of SLA – other factors are not important.

Thornbury then goes on to explain that the modern view sees the brain as part of a larger set, involving the body and the world, leading to a new concept of “embodied cognition”. Without bothering with considerations of how “the mind” as a construct relates to the brain as a physical part of the body, Thornbury proceeds to look at the mind as embodied, embedded and extended.

Embodied Mind

The construct of the “embodied mind” is defined as “rooted in physical experience”. Our mind (see how hard it is, even for Thornbury, to stay away from Cartesian dualism) deals with ideas that are all related to our “physicality” as Thornbury puts it, and this applies to language and language learning. Key points here are:

  • “Language is rooted in human experience of the physical world”(Lee, 2010)
  • We adapt our language to different circumstances and different people.
  • Learning is enhanced by physical involvement.
  • Larsen-Freeman’s latest work argues that language is a dynamic emergent system.
  • Language is noted, applied and adapted in context.
  • Mirror neurons and body language are evidence for the embodied mind construct.

Embedded Mind

No definition of this construct is offered. Thornbury only says that language is embedded in context, which should come as a surprise to nobody. Thornbury refers to “ecolinguistics”, likens the learning of language to the learning of soccer by children, and reminds us that we adapt our language to different circumstances and different people.

Extended Mind

The “Extended mind” construct is nowhere even casually defined, but Scott uses the film Memento (a great film which I recommend, but which has little to do with Scott’s description or use of) to make the point that our bodies help us to remember. This is followed by a discussion of gestures, which have a big role in communication.


Not much here. Thornbury refers to the importance of our physical relationship to our students and says “Learning is discovering alignment”. This means group work, gesture, eye contact, “acting out”.


Thornbury gives this summary:

  • I think therefore I am: Wrong. Better:
  • I move therefore I am.
  • I speak therefore I move.
  • I move, therefore I learn.


Thornbury’s talk is interesting, and very well-delivered: he’s the best stand-up act in the business (sic) and his use of video clips is particularly good. But when you tot it all up, there’s almost nothing of substance, and the argument is hollow. Thornbury makes a straw man argument against research in SLA, and says nothing of much interest as to how all this “embodied” stuff might further our understanding of SLA. As to teaching, there’s absolutely no need to even mention “embodied cognition” in order to agree with all the good things he says about gestures and the rest of it. Earl Stevick was indeed concerned with holistic learning and a teaching methodology which reflected it, but I doubt he’d be impressed by this attempt to use fashionable speculations about cognitive science to back it up.

Specific Points

  1. The use of Descartes to promote an argument against current SLA research is simplistic and boringly trite. In the “Discourse on Method” Descartes was concerned with epistemology, with reliable knowledge. His famous conclusion “Cogito ergo sum” has never been falsified – how could it be! – and it’s plain silly to say that “he got it wrong”
  2. Thornbury says that SLA orthodoxy sees language learning as a purely cognitive phenomenon taking place in the mind. He’s wrong. The most productive research in SLA concentrates on cognitive aspects of SLA, but those involved in such research are quite aware that they’re focusing on just one aspect of the problem. They do so for the very good reason that scientific research gets the best results. The job of those who look at other aspects, such as those covered by sociolinguistics, is to show how their work has academic respectability, and misrepresenting the work of those who adopt a scientific methodology does nothing to enhance that job.
  3. The question of the distinction between the brain and the mind is a fundamental one. Thornbury doesn’t even mention it. .


Thornbury, following the muddled and generally incoherent arguments of Larsen Freeman, wants to say that SLA is best seen as an emerging process where, well, things emerge. And given that it all kind of emerges, ELT should help all these things, well, emerge. This is absolutely hopeless, isn’t it? Any theory of SLA must be sharper than this; any teaching methodology needs a firmer basis. There is, of course, a very interesting strand of SLA research that takes an emergentist approach, but it has little in common with Thornbury’s musings. And there are, of course, teaching methodologies based on helping learners “emerge”, although they don’t put it quite like that. Thornbury has done very little to critique SLA research, or to explain how all his “emerging” bits and pieces might help future research move in a better direction.Furthermore, nothing in his suggestions for teaching practice is new, and none of it depends on his “theoretical basis”.

Finally, let’s just have another look at this:

  • I think therefore I am: Wrong. Better:
  • I move therefore I am.
  • I speak therefore I move.
  • I move, therefore I learn.

Not exactly a syllogism, now is it? I speak therefore I move? Really?

And quite apart from being incoherent, how will it affect your understanding of SLA?  Still, at least the last sorry line might inspire you to get off your butt and revisit Asher – he of Total Physical Response.

Are we on the brink of a paradigm shift in ELT?

Kuhn famously used the term “paradigm shift” to challenge the account given by philosophers of science such as Popper of how scientific theories evolved and progressed. Popper said that scientific progress was gradual and accumulative; Kuhn said it was sudden and revolutionary and involved paradigm shifts where one way of thinking was suddenly swept away and replaced by another. A paradigm shift involves a revolution, a transformation, a metamorphosis in the way we see something and it has profound practical implications. Change begins with a change in awareness and perception. Our perception is heavily influenced by our past and by social conditioning, and most of the time we go along with the paradigm view / normal science / the status quo / the theory taught at MIT / the prevalent narrative. But there are revolutionary moments in history when we prove ourselves to be capable of transforming and transcending the prevailing paradigms which so affect our lives, and I wonder if we are currently approaching a paradigm shift in ELT?

The present ELT paradigm has these characteristics:

  • Standard English is the subject taught.
  • Vocabulary and grammar are the subject matter of EFL / ESL.
  • SLA involves learning the grammar and lexicon of the language and practicing the 4 skills.
  • A product syllabus is used. This focuses on what is to be taught, and, to make the “what” manageable, chops language into discrete linguistic items which are presented and practiced separately and step by step in an accumulative way.
  • A coursebook is used. The coursebook is the most important element determining the course. It’s usually grammar-based and presents the chopped up bits of language progressively. Other material and activities aim at practicing the 4 skills.
  • The teacher implements the syllabus , using the coursebook. The teacher makes all day-to-day decisions affecting its implementation.
  • The students are not consulted about the syllabus and have only a small say in its implementation.
  • Assessment is in terms of achievement or mastery, using external tests and exams.

The rival view of ELT has very different characteristics:

  • Standard English is one variety of English; it is not the subject taught.
  • Texts (discourse) are the subject matter of EFL /ESL.
  • SLA involves the socially-mediated development of interlanguage.
  • A process syllabus is used. This focuses on how the language is to be learned. There’s no pre-selection or arrangement of items; objectives are determined by a process of negotiation between teacher and learners as a course evolves. The syllabus is thus internal to the learner, negotiated between learners and teacher as joint decision makers, and emphasises the process of learning rather than the subject matter.
  • No coursebook is used.
  • The teacher implements the evolving syllabus in consultation with the students.
  • The students participate in decision-making about course objectives, content, activities and assessment.
  • Assessment is in terms of low-stakes formative assessment whose purpose is “to act as a way of providing individual learners with feedback that helps them to improve in an ongoing cycle of teaching and learning” (Rea-Dickens, 2001).

If this rival view were to be widely-adopted in ELT it would certainly constitute a revolution, a complete paradigm shift. But will it happen? When one looks at the arguments for and against the 2 views of ELT sketched above, it’s difficult to escape the feeling that the current paradigm is becoming less and less defensible, in the light of increasing knowledge of the the SLA process; poor results of classroom-based ELT courses; poor morale among teachers (apart from suffering from bad working conditions and pay, most teachers are denied the freedom to teach as they’d like to); and the increasing viability of alternatives.

Doesn’t the alternative seem so much more appealing? What’s better, that course content grows out of the experiences of the learners and is based on topics which reflect their reality, or that it derives from a coursebook made in London or New York? What’s better, that conversational dialogue is the essential component of the course, or that the teacher talks most of the time, gives presentations about English and leads the learners through prefabricated activities? What’s better, that the teacher follows orders and carries out a plan made by somebody in London or New York, or that the teacher is given permission to build the course as it goes along, involving learners in all the important decisions concerning objectives, content, activities and assessment? From both the learners’ and the teachers’ point of view, which approach is likely to lead to higher levels of interest, motivation, energy, engagement and satisfaction? Which approach is likely to lead to better results?

And don’t the replies to criticism of those who promote the current paradigm add further weight to the alternative argument? I’ve discussed elsewhere how some of the leading lights in ELT respond to criticisms of the current paradigm, and I think it’s fair to say that none of them has offered any proper defence of it. The gist of the argument is that alternatives are “unrealistic” and that ELT practice under the present paradigm is slowly but surely improving. As Harmer puts it, unafraid as always of using a handy cliché, “tests are getting better all the time”.

Another supporter of the present paradigm, Jim Scrivener, shows how little importance he gives to any real examination of alternatives. Scrivener simply assumes that teachers must run the show and that “Made in the UK (or USA)” coursebooks and test materials should determine course objectives and content. Rather than question these two fundamental assumptions, Scrivener takes them as given and thinks exclusively in terms of doing the same thing in a more carefully-considered way. In Scrivener’s scheme of things, everything in the ELT world stays the same, but the cobwebs of complacency are swept away and everybody demands high (whatever that means). So teachers are exhorted to up their game: to use coursebooks more cleverly, to check comprehension more comprehensively, to practice grammar more perspicaciously, to re-cycle vocabulary more robustly, and so on, but never to think outside the long-established framework of a teacher-led, coursebook-driven course of English instruction. Recently Scrivener commented that a good coursebook is “a brilliant exploitable all-bound-up-in-one-package resource.” No attempt is made to argue the place of coursebooks in ELT, but Scrivener does take the opportunity to caution on the need for teachers to be trained in how to use coursebooks. Some teachers find reading pages of coursebooks (in the sense of appreciating the links between different parts of the page and pages) “baffling” and so they need to be shown how to “swim” in the coursebook, how to take advantages of all that it has to offer. Apart from giving the impression that he thinks he’s very smart and that most teachers are very dumb, Scrivener gives more evidence of the limits of his vision: nowhere does he discuss training teachers how to do without a coursebook, for example. After all, why on earth would anybody want to do that?

In the same discussion of coursebooks on Steve Brown’s blog, Scott Thornbury eloquently summarized the case against them. I cut and pasted his summary on this blog, leading Hugh Dellar to tweet “Shocking disdain for the craft of writers & editors, as well as the vast majority of teachers from @thornburyscott.” This is typical of Dellar’s response to criticism of coursebooks in two respects. First it is badly-written, and second it takes offence rather than offering any evidence or arguments to the contrary. Dellar has made a number of comments on my criticisms of the dominant role of coursebooks in current ELT, but none of them offers any argument to refute the claim that coursebooks are based on false assumptions and that a process syllabus better respects research findings in SLA, and represents a better model of education. In all the recent discussions of teaching methodology, the use of coursebooks, the design and use of tests, teacher training, and so on, both in the big conferences and in blogs, nobody who defends the current paradigm of ELT has properly addressed the arguments above or the arguments for an alternative offered by Richard Breen, Chris Candlin, John Faneslow, Mike Long, Rose Bard, Graham Crookes, Scott Thornbury, Luke Meddings, and many others. These are met with a barrage of fallacious arguments and very little else.

While I believe that those who fight against the current paradigm have the more persuasive arguments, not to mention the more exciting agenda, I unfortunately don’t believe that we’re on the brink of a paradigm shift in ELT. The status quo is too strong and the business interests that support and sustain this status quo and its institutions are too powerful. The alternative view of ELT described here is essentially a left-wing view which is just too democratic to stand a chance in today’s world. I suppose the best that those of us who believe in an alternative can do is to argue our case and make our voice heard. Whether or not to compromise is another important issue. I was interested to see Luke Meddings propose a 50-50 deal recently: “OK”, he suggested, “just put the book and the tests away for 50% of the time!” I don’t feel comfortable with that, but he might well be on the right track.

Chomsky’s Critics 2: Elizabeth Bates

Elizabeth Bates (1947 – 2003) was a brilliant scholar perhaps best known for her work with Brian MacWhinney on the Competition Model and Connectionism. In her often outspoken work, Bates challenges the modular theory of mind and, more specifically, criticises the nativists’ use of accounts of “language savants” and those suffering from cognitive or language impairment disabilities to support their theory.  Specifically, in her review of Smith and Tsimpli’s The mind of a savant , Bates (2000) challenges the authors’ conclusions about Christopher, the savant in question, and, along the way, challenges the two main arguments supporting the UG “ideology”, as she calls it: the existence of universal properties of language, and the poverty of the stimulus.

First, the existence of language universals does not provide compelling evidence for the innateness of language, because such universals could arise for a variety of reasons that are not specific to language itself (e.g., universal properties of cognition, memory, perception, and attention).  (Bates, 2000: 5)

Bates, following Halliday, gives the analogy of eating food with ones’ hands (with or without tools like a fork or a chopstick), which can be said to be universal. Rather than posit “an innate hand-feeding module, subserved by a hand-feeding gene”, a simpler explanation is that, given the structure of the human hand, the position of the mouth, and the nature of the food we eat, this is the best solution to the problem.

In the same vein, we may view language as the solution (or class of solutions) to a difficult and idiosyncratic problem: how to map a rich high-dimensional meaning space onto a low-dimensional channel under heavy information-processing constraints, guaranteeing that the sender and the receiver of the message will end up with approximately the same high-dimensional meaning state.  Given the size and complexity of this constraint satisfaction problem, the class of solutions may be very small, and (unlike the hand-feeding example) not at all transparent from an a priori examination of the problem itself  (Bates, 2000: 5).

Bates gives other examples to support her argument that solutions to particular problems of perception and cognition often evolve in an ad hoc way, and that there is no need no jump to the convenient conclusion that the problem was solved by nature.  As she says “That which is inevitable does not have to be innate!”  (Bates, 2000:  6)

Bates sees language as consisting of a network, or set of networks, and she was one of the first to begin work on a connectionist model, known now as the Competition Model. She’s refreshingly frank in recognising that neural network simulations of learning are still in their infancy, and that it’s still not clear how much of human language learning such systems will be able to capture. Nevertheless, she says, the neural network systems which have already been constructed are able to generalise beyond the data and recover from error. “The point is, simply,” says Bates, “that the case for the unlearnability of language has not been settled one way or the other” (Bates, 2000: 6).

Bates goes on to say that when the nativists point to the “long list of detailed and idiosyncratic properties” described by UG, and ask how these could possibly have been learned, this begs the question of whether UG is a correct description of the human language faculty.  Bates paraphrases their argument as follows:

  1. English has property P.
  2. UG describes this property of English with Construct P’.
  3. Children who are exposed to English, eventually display the ability to comprehend and produce English sentences containing property P.
  4. Therefore English children can be said to know Construct P’.

Bates comments:

There is, of course, another possibility: Children derive Property P from the input, and Construct P’ has nothing to do with it. (Bates, 2000: 6)

An important criticism raised by many, and taken up by Bates, against Chomsky’s theory is that it is difficult to test. In principle, one of the strong points of UG is precisely its empirical testability – find a natural language where the description does not fit, or find a mature language user of a natural language who judges an ill-formed sentence to be grammatical, and you have counter-evidence. However, Bates argues that the introduction of parameters and parameter settings “serve to insulate UG from a rigorous empirical test.” In the case of binary universals (e.g., the Null Subject Parameter), any language either will or will not display them, they “exhaust the set of logical possibilities and cannot be disproven.” Other universals are allowed to be silent or unexpressed if a language does not offer the features to which these universals apply. For example universal constraints on inflectional morphology cannot be applied in Chinese, since Chinese has no inflectional morphology. Rather than allow Chinese to serve as a counter example to the universal, the apparent anomaly is resolved by saying that the universal is present but silent. Bates comments: “It is difficult to disprove a theory that permits invisible entities with no causal consequences.


1. Poverty of the Stimulus

Many of the criticisms made by Sampson and Bates do not seem to me to be well-founded.  While Bates is obviously correct to say that language universals could arise for a variety of reasons that are not specific to language itself, Bates provides no evidence against Chomsky’s claims. To say that “the case for the unlearnability of language has not been settled” amounts to the admission that no damning evidence has yet been found against the poverty of the stimulus argument, and, of course, such an argument can never be “proved”.

In general, to suggest that learning a language is just one more problem-solving task that the general learning machinery of the brain takes care of ignores all the empirical evidence of those adults who attempt and fail to learn a second language, and the evidence of atypical populations who successfully learn their L1.  Despite Bates’ careful and convincing unpicking of the more strident claims made by nativists in their accounts of atypical populations, it’s hard to explain the cases of those with impaired general intelligence who have exceptional linguistic ability (see Smith, 1999: 24), or the cases of those with normal intelligence who, after a stroke, lose their language ability while retaining other intellectual functions (see Smith 1999: 24-29), if language learning is not in fact localised.

Turning to Sampson, when he challenges Chomsky’s poverty of the stimulus argument by saying that many children have in fact been subjected to input like Blake’s Tyger poem, he ignores the obvious fact that many children have not, and when he says that children need input of yes/no questions in order to learn how to form them, nobody would disagree; the question remains of how the child also learns about aspects of the grammar that are not present in the input. In my recent discussion with Scott about the poverty of the stimulus argument, he claimed, as does Sampson, that “everything the child needs” is, in fact, present in the input, and thus no resort to nativist arguments of modular mind, innate knowledge, the LAD, or any of that, is necessary. While Sampson attempts, bizarrely and without success, to use Popper’s arguments for progress in science through conjectures and refutations as a model for language acquisition, I think Scott was relying more on the kind of emergentist theory of learning that Bates has promoted. But, in my opinion, only Bates shows any appreciation for just how hard it is to do without any appeal to innateness. Let’s take a quick look.

Nativism vs. Emergentism

Gregg (2003) highlights the differences between the two approaches. On the one hand, he says, we have Chomsky’s theory which posits a rich, innate representational system specific to the language faculty, and non-associative mechanisms, as well as associative ones, for bringing that system to bear on input to create a grammar. On the other hand, we have the emergentist position, which denies both the innateness of linguistic representations  and the domain-specificity of language learning mechanisms.

Starting from the premise that items in the mind get there through experience, emergentists adopt a form of associationism and argue that items that go together in experience will go together in thought. If two items are paired with sufficient frequency in the environment, they will go together in the mind.  In this way we learn that milk is white,  -ed is the past tenser marker for English verbs, and so on. Associationism shares the general empiricist view that complex ideas are constructed from simple “ideas”, which in turn are derived from sensations caused by interaction with the outside world. Gregg (2003) acknowledges that these days one certainly can model associative learning processes with connectionist networks, but he highlights the severe limitations of connectionist models by examining the Ellis and Schmidt model (see Gregg, 2003: 58 – 66) in order to emphasise just how little the model has learned and how much is left unexplained.  Re-reading the 2003 article makes me wonder if Scott and others who dismiss innateness as an explanation appreciate the sheer implausibility of a project which does without it. How can emergentists seriously propose that the complexity of language emerges from simple cognitive processes being exposed to frequently co-occurring items in the environment?

And so we return to the root of the problem of any empiricist account: the poverty of the stimulus argument.  Emergentists, by adopting an associative learning model and an empiricist epistemology, where some kind of innate architecture is allowed, but not innate knowledge, and certainly not innate linguistic representations, have a very difficult job explaining how children come to have the linguistic knowledge they do. They haven’t managed to explain how general conceptual representations acting on stimuli from the environment produce the representational system of language that children demonstrate, or to explain how, as Eubank and Gregg put it “children know which form-function pairings are possible in human-language grammars and which are not, regardless of exposure” (Eubank and Gregg, 2002: 238). Neither have emergentists so far dealt with “knowledge that comes about in the absence of exposure (i.e., a frequency of zero) including knowledge of what is not possible” (Eubank and Gregg, 2002: 238).

I gave Vivian Cook’s version of the PoS argument in Part 1, but let me here give  Gregg’s  summary of Laurence and Margolis’ (2001: 221) “lucid formulation”:

  1. An indefinite number of alternative sets of principles are consistent with the regularities found in the primary linguistic data.
  2. The correct set of principles need not be (and typically is not) in any pretheoretic sense simpler or more natural than the alternatives.
  3. The data that would be needed for choosing among those sets of principles are in many cases not the sort of data that are available to an empiricist learner.
  4. So if children were empiricist learners they could not reliably arrive at the correct grammar for their language.
  5. Children do reliably arrive at the correct grammar for their language.
  6. Therefore children are not empiricist learners (Gregg, 2003: 48).

To the extent that the emergentists insist on a strict empiricist epistemology, they’ll find it extremely difficult to provide any causal explanation of language acquisition, or, more relevant to us, of SLA. Combining observed frequency effects with the power law of practice, for example, and thus explaining acquisition order by appealing to frequency in the input doesn’t go far in explaining the acquisition process itself.  What role do frequency effects have, how do they interact with other aspects of the SLA process?  In other words, we need to know how frequency effects fit into a theory of SLA, because frequency and the power law of practice don’t provide a sufficient theoretical framework in themselves. Neither does connectionism; as Gregg points out “connectionism itself is not a theory….. It is a method, and one that in principle is neutral as to the kind of theory to which it is applied” (Gregg, 2003: 55).

 2. Idealisation

There is also the question of idealisation, stressed by Sampson in his criticisms, and probably the most frequently-expressed objection made to UG. The assumption Chomsky makes of instantaneous acquisition, like the idealisation of the “ideal speaker-listener in a completely homogenous speech-community”, is a perfectly respectable tool used in theory construction: it amounts to no more than the “ceteris paribus” argument that allows “all other things to be equal” so that we can isolate and thus better examine the phenomenon in question. Idealisations are warranted because they help focus on the important issues, and to get rid of distractions, which does not mean that this step is immune to criticism, of course. It’s up to Chomsky to make sure that any theories based on idealizations are open to empirical tests, and it is then up to those who disagree with Chomsky to come up with some counter evidence and/or to show that the idealisation in question has protected the theory from the influence of an important factor.  Thus, if Sampson wants to challenge Chomsky’s instantaneous acquisition assumption, he will have to show that there are differences in the stages of people’s language acquisition which result in significant differences in the end state of their linguistic knowledge.

While on the subject of idealisations, we may deal with the criticism of sociolinguists who challenge Chomsky’s idealisation to a homogenous speech community by saying that Chomsky is ruling out of court any discussion of variations within a community.  Chomsky would reply that he’s doing no such thing, and that if anybody is interested in studying such variations they are welcome to do so.  Chomsky’s opinion of the scant possibility of progress in such an investigation is well-known, but he of course admits that it’s  only an opinion. What Chomsky is interested in, however, is the language faculty, and the acquisition of a certain type of well-defined knowledge. In order to better investigate this domain, Chomsky idealises the speech community.  Sociolinguists can either produce arguments and data which show that such an idealization is illegitimate (i.e. that it isolates part of the theory from the influence of a significant factor), or say that they are interested in a completely different domain.  It seems to be often the case that criticisms of Chomsky arise from misunderstandings about the role of idealisations in theory construction, or about the domain of a theory.

Weaknesses of UG theory

Chomsky’s theory runs into difficulties in confronting the question of how UG evolves, and how the principles and parameters arrive at a stable state in a normal child’s development.  Furthermore, there’s  no doubt that the constant re-formulation of UG results in “moving the goal points” and protecting the theory from bad empirical evidence by the use of ad hoc hypotheses.

And we shouldn’t forget that when we discuss UG we have the “principles and parameters” theory in mind, and not the “Minimalist” programme, let alone Internalism. Internalism sees Chomsky insisting that the domain of his theory is not grammar but “I-language”, where “I” is “Internal” and where “Internal” means in the mind. While exposure to external stimuli is necessary for language acquisition, Chomsky maintains that, as Smith puts it “the resulting system is one which has no direct connection with the external world” (Smith, 1999: 138). This highly counter-intuitive claim takes us into the technicalities of a philosophical debate about semantics in general and “reference” in particular, where Chomsky holds the controversial view that semantic relations “are nothing to do with things in the world, but are relations between mental representations: they are entirely inside the head”  (Smith, 1999: 167).  Perhaps the most well-known example of this view is Chomsky’s assertion that while we may use the word “London” to refer to the capital city of the UK, it’s unjustified to claim that the word itself refers to some real entity in the world.  Go figure, as they say.

But the most important criticism I personally have of UG is that it is too strict and too narrow to be of much use to those trying to build a theory of SLA. I think it’s important to challenge Chomsky’s claim that questions about language use “lie beyond the reach of our minds”, and that they “will never be incorporated within explanatory theories intelligible to humans” (Chomsky, 1978).  Despite Chomsky’s assertion, I think we may assume that the L2 acquisition process is capable of being rationally and thoroughly examined.  Further, I suggest that it need not be, indeed should not be, idealised as an instantaneous event, which is to say, I assume that we can ask rational questions about the stages of development of interlanguages, that we can study the real-time processing required to understand and produce utterances in the L2, that we can talk about not just the acquisition of abstract principles but of skills, and even that we can study how different social environments affect SLA.

By insisting on a “scientific” status for his theory, Chomsky severely limits its domain, and to appreciate just how limited the domain of UG is, let us remind ourselves of Chomsky’s position on modularity.  Chomsky argues that in the human mind there is a language faculty, or grammar module, which is responsible for grammatical knowledge, and that other modules handle other kinds of knowledge. Not all of what is commonly referred to as “language” is the domain of the language module; certain parts of peripheral grammatical knowledge, and all pragmatic knowledge, are excluded. To put it another way, the domain of Chomsky’s theory is restricted by his distinction between I-language and E-language; Chomsky is concerned with the individual human capacity for language, and with the universal similarities between languages – his domain deliberately excludes the community. No justification needs to be offered for deciding to focus on a particular phenomenon or a particular hypothesis, but it is essential to grasp the domain of Chomsky’s theory.  Cook (1994) puts it this way:

Chomskian theory claims that, strictly speaking, the mind does not know languages but grammars; ‘the notion “language” itself is derivative and relatively unimportant’ (Chomsky, 1980, p. 126).  “The English Language” or “the French language” means language as a social phenomenon – a collection of utterances.  What the individual mind knows is not a language in this sense, but a grammar with the parameters set to particular values.  Language is another epiphenomenon: the psychological reality is the grammar that a speaker knows, not a language (Cook, 1994: 480).

Gregg (1996) has this to say:

… “language” does not refer to a natural kind, and hence does not constitute an object for scientific investigation.  The scientific study of language or language acquisition requires the narrowing down of the domain of investigation, a carving of nature at its joints, as Plato put it. From such a perspective, modularity makes eminent sense (Gregg, 1996: 1).

Chomsky himself says that what he seeks to describe and explain is

The cognitive state that encompasses all those aspects of form and meaning and their relation, including underlying structures that enter into that relation, which are properly assigned to the specific subsystem of the human mind that relates representations of form and meaning. A bit misleadingly perhaps, I will continue to call this subsystem ‘the language faculty’ (Chomsky 1980).

Pragmatic competence, on the other hand, is left out because

there is no promising approach to the normal creative use of language, or to other rule-governed acts that are freely undertaken…..  the creative use of language is a mystery that eludes our intellectual grasp (Chomsky, 1980).

Chomsky would obviously agree that syntax provides no more than clues about the content of any particular message that someone might try to communicate, and that pragmatics takes these clues and interprets them according to their context.  If one is interested in communication, then pragmatics is vital, but if one is interested in language as a code linking representations of sound and meaning, then it is not.  Chomsky’s strict demarcation between science and non-science effectively rules out the study of E-Language, and consequently his theory neither describes nor explains many of the phenomena that interest linguists. Far less does UG describe or explain the phenomena of SLA. By denying the usefulness of attempts to explain aspects of language use and usage that fall outside the domain of I-Language, UG  can’t be taken as the only valid frame of reference for SLA research and theory construction, or even as a good model.


Bates, E. (2000) Language Savants and The Structure of The Mind.  International Journal of Bilingualism. 

Bates, E.; Elman, J.; Johnson, M.; Karmiloff-Smith, A.; Parisi, D.; and Plunkett, K. (1998) Innateness and Emergentism.  In Bechtel, W., and Graham, G., (eds) A Companion to Cognitive Science. 590-601. Oxford: Basil Blackwell.

Bates, E. and Goodman, J. (1997) On the inseparability of grammar and the lexicon: evidence from apasia, acquisition and real-time processing.  Language and Cognitive Processes, 12 , 507-584.

Chomsky, N. (1980) Rules and representations. Oxford: Basil Blackwell.

Cook, V. J. (1994) The Metaphor of Access to Universal Grammar in L2 Learning.  In Ellis, N. (ed.)  Implicit and Explicit Learning of Languages.  London: Academic Press.

Gregg, K. R. (1996) The logical and developmental problems of second language acquisition.  In Ritchie, W.C. and Bhatia, T.K. (eds.) Handbook of second language acquisition.  San Diego: Academic Press.

Gregg, K. R. (2000) A theory for every occasion: postmodernism and SLA.  Second Language Research 16, 4, 34-59.

Gregg, K. R. (2003) The state of emergentism in second language acquisition.  Second Language Research, 19, 2, 42-75.

Laurence, S. and Margolis, E. (2001) The Poverty of the Stimulus Argument. British Journal for the Philosophy of Science, Vol. 52, 3.

Smith, N. (1999) Chomsky: Ideas and Ideals.  Cambridge: Cambridge University Press.

Smith, N., & Tsimpli, I-M. (1995). The mind of a savant: Language learning and modularity. Oxford: Basil Blackwell.

British Jnl. for the Philosophy of Sci.Volume 52, Issue 2 Pp. 217-276.

Chomsky’s Critics 1. Sampson

Scott Thornbury’s latest Sunday post gave what I thought was a very poor account of the poverty of the stimulus argument and of objections to it.  While Scott was quite measured in his original remarks, his post showed a spectacular disregard for logic, and the wave of enthusiastic messages of support which flooded in from a frightening array of dimwits and cranks seemed to unhinge our normally restrained hero, provoking him to ever more outrageous and fanciful claims. I and a couple of other sensitive souls did our modest best to keep him on the rails, but we failed, the wheels came off, and last time I looked, the whole crazy bunch of them were swapping quotes from Derrida, counting backwards from 666, trying to communicate with each other without switching their brains on, and using impoverished input devices like the Microsoft keyboard. Since they’ve all shown themselves to be useless at marshalling a case against Chomsky for themselves, I thought I’d offer a helping hand. I’m all heart, really.  So here’s the case against Chomsky as argued by two of his leading critics: Geoffrey Sampson and Elizabeth Bates.

Before we start on Sampson, let’s quickly state the poverty of the stimulus argument. It says: since children know things about language that they’ve never been exposed to, that knowledge must be innate. Vivian Cook puts it like this:

Step A. A native speaker of a particular language knows a particular aspect of syntax.

Step B. This aspect of syntax could not have been acquired from language input. This involves considering all possible sources of evidence in the language the child hears and in the processes of interaction with parents.

Step C. This aspect of syntax is not learnt from outside. If all the types of evidence considered in Step B can be eliminated, the logical inference is that the source of this knowledge is not outside the child’s mind.

Step D. This aspect of syntax is built-in to the mind (Cook, 1991).

The UG argument is that all natural languages share the same underlying structure, and the knowledge of this structure is innate.

Sampson says that Chomsky’s claims about the linguistic data available to the child  are “untrue”, and he takes Chomsky’s example (used at the famous 1975 conference at Royaumont, where Piaget, Chomsky, Fodor, and others gathered to discuss the limitations of the genetic contribution to culture) of two different hypotheses about the grammar of yes/no questions in English. Turning an English statement into the corresponding yes/no question involves operating on a finite verb in the statement. Either the verb itself is moved to the left (if the verb is a form of be, do, have, or a modal verb such as will) – thus ‘The man is tall’ becomes ‘Is the man tall?’; or, in all other cases the verb is put into the infinitive and an inflected form of do is placed to the left – thus ‘The man swims well’ becomes ‘Does the man swim well?’  (Sampson, 1997: 40).

Chomsky says there are two hypotheses that the child learning English might try:  1. operate on the first finite verb;  2. operate on the finite verb of the main clause.  Hypothesis 1 violates the structure dependence universal and is false (applied to the sentence “The man who is tall is sad.”, it would give: “Is the man who tall is sad?”).  Hypothesis 2 is correct. Yet both hypotheses work in all questions except those formed from statements containing a subordinate clause which precedes the main verb.  The child cannot decide by observation whether one or the other hypothesis is true, because cases of statements containing a subordinate clause which precedes the main verb are extremely rare. Therefore, the child decides on the basis of innate knowledge. In reply to this Sampson says that many examples actually exist, including the well-known line from Blake’s The Tyger “Did he who made the Lamb make thee?”  Sampson goes on to give a number of other examples from a children’s corpus, and concludes:

Since Chomsky has never backed up his arguments from poverty of the child’s data with detailed empirical studies, we are entitled to reject them on the ground that the data available to the child are far richer than Chomsky supposes.  (Sampson, 1997: 42)

Sampson then attacks Chomsky’s “question-begging idealizations”.  Chomsky distinguishes between competence (a certain type of knowledge which is the phenomenon that he wants to explain), and performance (data, much of which he judges to be irrelevant). To examine competence, Chomsky argues that it’s necessary to make various simplifying assumptions, but Sampson claims that Chomsky’s use of simplifications distorts the substantial point at issue.  Each of the counterfactual simplifying assumptions about human language which Chomsky makes “eliminates a plausible alternative from consideration through what is presented as a harmless, uncontroversial assumption” (Sampson, 1997: 51).  Sampson gives the example of the assumption that language acquisition is an instantaneous process. This, says Chomsky, is “a harmless assumption, for if it mattered then we would expect to find substantial differences in the result of language learning depending on such factors as order of presentation of data, time of presentation, and so on.  But we do not find this” (Chomsky, cited in Sampson, 1997: 51-52). Sampson replies that language acquisition is not an instantaneous process (as Chomsky elsewhere admits), and it is not a harmless simplification to say that it is. As Sampson says:

To claim that it is harmless to pretend that language acquisition is instantaneous is, in effect, to assume that language acquisition does not work in a Popperian fashion, without going to the trouble of arguing the point.  (Sampson, 1997: 52)

Chomsky acknowledges that children do not move from ignorance to mastery of language instantaneously, but he insists that “fairly early in life” a child’s linguistic competence reaches a “steady state”, after which there are no significant changes.  Sampson points out, however, that this “steady state” idea is contested by Bloomfield and Whitney (both of whom see language learning as a lifelong process), and is also completely at odds with the Popperian approach to learning, which brings us to Sampson’s alternative explanation of language acquisition.

Sampson argues that the essential feature of languages is their hierarchical structure.  Children start with relatively crude systems of verbal communication, and gradually extended syntactic structures in a pragmatic way so as to allow them to express more ideas in a more sophisticated way.  The way they build up the syntax is piecemeal; they concentrate on assembling a particular part of the system from individual components, and then put together the subassemblies. This gives them low level structures which are then combined, with modifications on the basis of input, into higher level structures, and so on.

Sampson uses the Watchmaker parable, first made by Herbert Simon (see Sampson, 1997:111-113), to explain linguistic development.  I won’t go into it here, but Sampson says that Simon’s parable shows that “complex entities produced by any process of unplanned evolution, such as the Darwinian process of biological evolution, will have tree-structuring as a matter of statistical necessity” (Sampson, 1997: 113). Furthermore, in Sampson’s view, “the development of knowledge, as Popper describes it, is a clear case of the type of evolutionary process to which Simon’s argument applies, and can be applied to syntactic structures”.  Sampson describes how the communication system of our ancestors gradually became more complex as language learners made longer sentences, which would enter the language if they made a significant enough contribution to transmitting information more economically, or if they were semantically innovative.  Similarly, a child acquires language by composing sub-assemblies from individual components, and then putting together the sub-assemblies.


Only a general learning theory is involved in Sampson’s explanation, which adopts a decidedly Popperian approach. The child tests various hypotheses about grammaticality against input, and slowly builds up the right hierarchically structured language by following a Popperian programme of conjectures and refutations. This supposes, of course, that the child is exposed to adequate input.  Sampson’s argument has two main strands: first, following Simon, gradual evolutionary processes have a strong tendency to produce tree structures; and second, following Popper, knowledge develops in a conjectures-and-refutations evolutionary way.  Sampson claims that these two strands are enough to explain language acquisition.

Perhaps Sampson’s criticism of one of Chomsky’s most central assumptions can serve to highlight the differences between them.  Chomsky says that

Linguistic theory is concerned primarily with an ideal speaker-listener, in a completely homogenous speech community, who knows its language perfectly (Chomsky, cited in Sampson, 1997: 53).

This assumption, which Chomsky describes as being of “critical importance” for his theory, excludes Sampson’s Popperian approach without even considering it.  For Sampson, learning is a “non-terminating process”, and language has no independent existence over and above the representations of the language in the minds of the various individuals belonging to the speech community that uses it.

What the language learner is trying to bring his tacit linguistic theory into correspondence with is not some simple, consistent grammar inhering in a collective national psyche…. Rather, he is trying to reconstruct a system underlying the usage of the various speakers to whom he is exposed; and these speakers will almost certainly be working at any given time with non-identical tacit theories of their own – so that there will not be any wholly coherent and unrefutable grammar available to be formulated.  The notion of a speaker-listener knowing the language of his community “perfectly” is doubly inapplicable – both because there is no particular grammar, achievement of which would count as “perfect” mastery of the language, and because even if there were such a grammar, there is no procedure by which a learner could discover it.  (Sampson, 1997: 53-54)

From Sampson’s Popperian perspective, even if language learners were “ideal” they would not attain “perfect” mastery of the language of the community.  As Sampson says:

Popperian learning is not an algorithm which, if followed without deviation, leads to a successful conclusion.  Therefore, to assume that it makes sense to describe an “ideal” speaker-listener as inhabiting a perfectly homogenous speech community and as knowing its language perfectly amounts, once again, to surreptitiously ruling the Popperian view of acquisition out of consideration. (Sampson, 1997: 55)

I personally don’t find Sampson’s arguments persuasive, and I’ll explain why after I’ve presented Bates’  case against Chomsky in the next post.


Cook, V. J. (1991) The poverty-of-the-stimulus argument and multi-competence.  Second Language Research, 7,2, 103-117

Sampson, G. (1999)  Educating Eve: the `language instinct’ debate. London: Cassell.

Challenging the Coursebook 3

Understanding interlanguage development helps in evaluating different approaches to ELT.  I’ve already touched on this issue in a post on TBLT,  and in Challenging Coursebooks 2, and here’s a bit more, intended as further support for my criticisms of coursebooks, and as preparation for a syllabus proposal. This is mostly a cut-and-paste paraphrasing of Long, 2011.

We must start by recognizing that learners, not teachers, have most control over their language development.  As Long (2011) says:

Students do not – in fact, cannot – learn (as opposed to learn about) target forms and structures on demand, when and how a teacher or a coursebook decree that they should, but only when they are developmentally ready to do so. Instruction can facilitate development, but needs to be provided with respect for, and in harmony with, the learner’s powerful cognitive contribution to the acquisition process.

A major source of evidence for the strength of the learner’s role in SLA, and simultaneously, about the limits of instruction, is the work that’s been done on processes in interlanguage development. Interlanguages (the construct was introduced by Selinker in 1972) are individual learners’ transitional versions of the L2, and studies show that they exhibit common patterns and features across differences in learners’ age and L1, acquisition context, and instructional approach. Independent of those and other factors, learners pass through well-attested developmental sequences on their way to mastery of target-language structures, or, as is often the case, to an end-state short of mastery. Examples of such sequences are found in the well known morpheme studies; the four-stage sequence for ESL negation; the six-stage sequence for English relative clauses; and the sequence of question formation in German (see Long, 2015 for a full discussion).

Long (2011) insists that SLA is not a process of forming new habits to override the effects of L1 transfer. Even when presented with, and drilled in, target-language forms and structures, and even when errors are routinely corrected, learners’ acquisition of newly-presented forms and structures is very rarely either categorical or complete, as is assumed by most coursebooks. On the contrary, acquisition of grammatical structures and sub-systems like negation or relative clause formation is typically gradual, incremental and slow, sometimes taking years to accomplish. Development of the L2 exhibits plateaus, occasional movement away from, not toward, the L2, and  U-shaped or zigzag trajectories rather than smooth, linear contours. No matter what the learners’ L1 might be, and no matter what the order or manner in which target-language structures are presented to them by teachers or by coursebook  writers, learners analyze the input and come up with their own interim grammars, the product broadly conforming to developmental sequences observed in naturalistic settings. They master the structures in roughly the same manner and order whether learning in classrooms, on the street, or both. This led Pienemann to formulate his learnability hypothesis and teachability hypothesis: what is processable by students at any time determines what is learnable, and, thereby, what is teachable (Pienemann, 1984, 1989). The effectiveness of negative feedback on error has been shown to be constrained in the same way (see, e.g., Mackey, 1999).

The 5 most studied processes of interlanguage development are

  • simplification (using “la” for “the”, and “un” for “a” in Spanish, regardless of gender, etc.,);
  • overgeneralization (using “ed” for irregular verbs);
  • restructuring (often involving back-sliding: going from “went” to “goed”, but often making adjustments which “improve” the IL);
  • U-shaped behaviour (went –> goed –> went);
  • and fossilization (“premature cessation of development in defiance of optimal learning conditions” Selinker, 1972).

While knowledge about the sequences and processes of interlanguage development should act mostly to warn us against any simple view of teaching and learning an L2, it can, Long says, inform good teaching by helping teachers (and their students) cultivate a different attitude towards errors, and more enlightened expectations for progress. “It can help them recognize that many so-called errors are a healthy sign of learning, that timing is hugely important in language teaching, and that not all that can be logically taught can be learned if learners are not developmentally ready. Knowledge about sequences and processes can also help counter the deficit view that interlanguages are defective surrogates of the target language by making it clear that interlanguages are shaped by the same systematicity and variability that shape all other forms of human language” (Long, 2011). It should also be remembered that if teachers respect the constraints of their learners’ trajectories, and especially if they teach according to the principles referred to below, they can have a dramatic positive effect on their learners’ rate of learning.

The question remains: Why don’t language teachers teach to the sequences and processes which have been identified in interlanguage studies?  First, because we don’t know how different sequences relate to each other in the grammar of individual learners, so we don’t know how to sequence grammatical targets according to developmental learner readiness principles. More importantly, language learning isn’t just learning  grammar: vocabulary, pragmatics, phonology, and so on are also involved. But the most fundamental objection is that learning an L2 isn’t about focusing on bits and pieces of language.  Rather than trying to organize instruction around grammar (or lexical chunks, for that matter) in a product syllabus, implemented by using a General English coursebook, we have a wide range of options which are more attuned to what we know about psycholinguistic, cognitive, and socioeducational principles for good language teaching.  These include Dogme, Task-Based-Language-Teaching, various forms of ESP, and various process syllabuses. All of them share the principles that I’ve outlined in previous posts on TBLT and Principles and Practice and I’ll propose one such syllabus shortly.

Long, M. (2011) “Language Teaching”. In Doughty, C. and Long, M.  Handbook of Language Teaching. NY Routledge.

Long, M. (2015) SLA and TBLT. N.Y., Routledge.

All other references can be found at the end of Long’s 2011 Chapter.


Since my presentation Challenging the Coursebook, there have been various responses.  With the one exception of Andrew Schmidt’s comments, none has dealt with the points I raised.

My argument  against the coursebook is in two parts.  First, most coursebooks assume that  presenting and practicing discrete formal aspects of the language in a pre-determined sequence will lead to declarative knowledge becoming procedural, and that the synthetic bits of language presented and practiced in the coursebook will be accumulated by learners in such a way as to result in the progressive re-structuring of their interlanguages.  Both these assumption are false. The assumption that learners will learn what they’re taught when they’re taught it is also false.

Second, coursebooks impose a product (synthetic) syllabus on users,  but a process (analytic) syllabus  caters better to learners’ needs and is likely to lead to faster learning and higher levels of attainment.

In reply, these comments have been made:

Not all coursebooks are the same: they differ in content and design.  Of course: and there are bound to be exceptions to my generalised assertion.  But apart from the coursebooks Anthony named, nobody else (and in particular, not Dellar) has given any coherent argument against the claim that most coursebooks are based on the false assumptions I attribute to them.

Teachers use coursebooks in very different ways.  Again: of course. But unless teachers use the coursebooks so sparingly or in ways so entirely different from the way the authors intend them to be used, the coursebook is the most important factor in determining what happens in the lessons comprising the course.

Coursebooks help busy, overworked teachers who don’t have time to prepare their own lesson plans and materials. Quite so. But if that’s the only reason to explain why teachers use them, then it follows that ELT would be better if we organised things in such a way that we didn’t rely on coursebooks.

Coursebooks help new teachers who need obvious structure and guidance. Ditto.

Expecting teachers to make their own materials without paying them is worse than asking them to use a coursebook.  Ditto.

Despite all their flaws, I use coursebooks, so there.  I know this is supposed to be funny, or witty, or something, but it’s a bit too near the truth to make me laugh.

I find it depressing that so little importance seems to be given to the underlying principles which inform our teaching practice. Why are most teachers not more concerned about these principles?  Why is there so little attempt made to seriously confront the argument that SLA is a predominantly implicit process where declarative knowledge and explicit instruction is known to play a minor role in facilitating language learning?  Likewise, why are so few people in ELT ready to take seriously the various proposals that have been made for a process syllabus?  Rather than make an attempt to critically appraise the arguments against coursebooks, or to put forward a coherent, principled counter-argument, all we get are excuses. And very lame excuses at that.

Challenging the Coursebook 2

My presentation in Part 1 argued that ELT should break the bad habit of relying on coursebooks. In response to comments, I offer here a bit more about interlanguage. My thanks to Alessandro Grimaldi for his paper which I’ve used to help me write this.

U-shaped learning behaviour is one of the patterns observed in the development of interlanguages. I should say at once that “interlanguage” is a theoretical construct, not a fact. While interlanguage as a construct has proved useful in developing a cognitive theory of SLA, the construct itself needs developing, and the theory which it is part of  is incomplete, and possibly false. Any good theory must allow that empirical observations can be made which will falsify it, but, so far, interlanguage theory has stood up to a number of such challenges quite well. Part of the support for the theory comes from observations of U-shaped behaviour in SLA, which indicate that learners’ interlanguage development is not linear. The same data can be used to show that the approach taken by all coursebooks which present and practice a sequence of discrete, formal aspects of the English language, on the assumption that these will be learned in the linear order that they’re presented, is wrong      

An example of U-shaped behaviour is this:


The example here is from a study in the 70s. Another example comes from morphological development, specifically, the development of English irregular past forms, such as came, went, broke, which are supplanted by rule-governed, but deviant past forms: comed, goed, breaked. In time, these new forms are themselves replaced by the irregular forms that appeared in the initial stage.

This U-shaped learning curve is observed in learning the lexicon, too. Learners have to master the idiosyncratic nature of words, not just their canonical meaning. While learners encounter a word in a correct context, the word is not simply added to a static cognitive pile of vocabulary items. Instead, they experiment with the word, sometimes using it incorrectly, thus establishing where it works and where it doesn’t. The suggestion of cognitive theory of SLA is that only by passing through a period of incorrectness, in which the lexicon is used in a variety of ways, can they climb back up the U-shaped curve. To add to the example of “feet” above, there’s the example of the noun ‘shop.’ Learners may first encounter the word in a sentence such as “I bought a pastry at the coffee shop yesterday.” Then, they experiment with deviant utterances such as “I am going to the supermarket shop,” correctly associating the word ‘shop’ with a place they can purchase goods, but getting it wrong. By making these incorrect utterances, the learner distinguishes between what is appropriate, because “at each stage of the learning process, the learner outputs a corresponding hypothesis based on the evidence available so far” (Carlucci and Case, 2011).

The re-organisation of new information as learners move along the U-shaped curve is a characteristic of interlanguage development. Associated with this restructuring is the construct of automaticity. Language acquisition can be seen as a complex cognitive skill where as your skill level in a domain increases, the amount of attention you need to perform generally decreases . The basis of processing theories of SLA is that we have limited resources when it comes to processing information and so the more we can make the process automatic, the more processing capacity we free up for other work.  Active attention requires more mental work, and thus, developing the skill of fluent language use involves making more and more of it automatic, so that no active attention is required. This is what I was referring to in my presentation when I compared learning a language to learning to drive a car. Through practice, language skills go  from a ‘controlled process’ in which great attention and conscious effort is needed to an ‘automatic process’. Such a process is mediated through constant restructuring of the interlanguage along the U-shaped development curve.

Automaticity can be said to occur when associative connections between a certain kind of input and some output pattern occurs.  For instance, in this exchange:

Speaker 1: Morning.

Speaker 2: Morning. How are you?

Speaker 1: Fine, and you?

Speaker 2: Fine.

the speakers, in most situations, don’t actively think about what they’re saying. In the same way, second language learners’ learn new language through use of controlled processes, which become automatic, and in turn free up controlled processes which can then be directed to new forms. Segalowitz applies this idea to a wide variety of skills when he says:

”Automatizing certain aspects of performance in order to free up attentional resources is fundamental to skilled performance in a number of areas because it allows performers to allocate their limited capacities to where they are most needed. That is, to a large extent, fluent performance in such areas as music or reading (e.g. performing particular runs or arpeggios on the piano; word recognition) involves being able to carry out certain activities with little or no investment of psychological resources (memory capacity, limited attentional capacity).”

We must now add to the hypothesis that learners are constantly restructuring their language as they move through the stages of the U-shaped learning curve the hypothesis of a fixed order of acquisition of parts of English, which I referred to in my presentation. I won’t repeat all that again, but I should make it clear that we don’t know very much about this fixed order – and even if we did, this wouldn’t mean that we were in a position to prescribe an order ofr presentation of structures or lexis in a syllabus. The research into SLA so far has only scratched the surface: most of the work remains to be done. But at least we know enough to say that learners don’t learn English in the way assumed by a coursebook series such as Headway. My argument is that coursebooks demonstrate a common underlying view of how language should be presented and practiced. This view rests on 4 false assumptions about proceduralisation, accumulation, teachability, and the product syllabus.

Needless to say, but say it I must, my arguments are no more than that, and I invite rational discussion of them. In his most recent response to these arguments Dellar says coursebooks are different from each other and teachers use coursebooks in different ways. This fails to address the issues. Much more interesting is Laura’s question “What can realistically be done as an alternative to using textbooks?” I’ll try to answer that very soon.

Carlucci, L. and Case, J.  (2013)  On the Necessity of U-Shaped Learning. Topics.

Challenging The Coursebook

Here’s a version of the presentation I gave at the InnovateELT conference. Click here.

I’m sorry to have missed a lot of it, but I was there long enough to appreciate the energy and warmth of the event. It was fresh, buzzing, and exactly the right scale. Great – innovative! – idea to have the “speed dating” session at the end where 6 presenters zoomed round groups who quizzed them. I tip my hat to the organisers and to the perfect support staff. Now THAT’s the way conferences should be run!

Motivation, again


I’ve made two attempts to talk about motivation (see the list of pages on the right) and here’s another. This time I offer a brief story of theory development which you might find interesting. I rely on Sam Croft’s (2014) excellent dissertation throughout; Much of the text is his, although I’ve butchered it and beg his pardon.


The Force

Research into motivation took a big step forward in 1972 when Canadian researchers Gardner and Lambert published their seminal work on the motivation of French and English speaking language learners in Canada (Gardner & Lambert, 1972). The authors suggested that understanding the relationship between the two communities was crucial in understanding their motivation to learn each other’s languages. This approach departed from previous conceptions of motivation, which focussed on the individual, and further drew a distinction between language learning and other subjects. Williams (1994) highlights the significance of this research, suggesting that it led to the conceptualisation of language learning as a process involving a fundamental alteration of self-image that was not part of the learning experience of other subjects in the curriculum.

Gardner introduced two terms which endure. The first is instrumentality, which refers to the pursuit of language study as a means to an end, for example in order to improve job prospects or to pass exams, the second is the concept of integrativeness, which Gardner (1972: 135) defines as the willingness to “identify with members of another ethno-linguistic group.” Gardner and Lambert’s work advanced the theory that intergative motivation was the key to success in second language learning.

The Counterforce


A series of longitudinal studies conducted in Hungary by Dörnyei and Csizér between 1993 and 2004 (Dörnyei & Csizér, 2002; Csizér & Dörnyei, 2005) aimed to empirically test integrativeness as a predictor of motivation. Hungary, a country with a small enough English speaking population to make integration with a target L2 community a practical impossibility, provided an excellent context in which to test their hypothesis that integrativeness was not,  in fact, the best predictor of motivation. And here comes the kicker. The results of the study showed integrativeness to perform extremely well in predicting motivated behaviour, leading Dörnyei, (2005) to call it the “integrativeness enigma.” As Ryan (2009) explains, the fact that this finding was obtained in a context where the possibility of integration didn’t exist made no sense, and thus the need for a reinterpretation of the concept of integrativeness became clear.

So Dörnyei and his associates developed the L2 motivational self system which refocused motivation theory away from the integrative paradigm towards a more internal approach directed at individuals hopes and aspirations for the future. Based on the psychological desire to “reduce the discrepancy between our current and future possible selves” (Ushioda and Dörnyei, 2009: 4), the new framework consisted of three central elements.

First, the ideal L2 self. This variable represents learners’ “ideal self image expressing the wish to become a competent L2 speaker” (Csizér and Kormos, 2005: 99). This variable, according to Ryan (2008), is the theoretical pivot of the entire framework, intended to replace integrativeness as the main variable in understanding motivated behaviour among language learners. The second component is the ought-to self, representing what learners believe their obligations and responsibilities as language learners to be. The third component is the L2 learning situation, which refers to a collection of variables such as the teacher, learner group and methods of instruction and the influence that these can have upon motivation.



It spoils the narrative a bit, but we must note, as Sam Croft faithfully does, that alongside the integrativeness enigma, one of the challenges tackled by the L2 self motivational system was distinguishing integrativeness from instrumentality (Csizér & Kormos, 2005). As Lamb (2004) points out, variables such as ‘desire to meet with westerners’ or ‘desire to use English websites,’ are increasingly difficult to categorise as driven by either instrumentality or integrativeness, and are in fact clearly linked to both. With the L2 self motivational system, there is a temptation to divide the ideal and ought-to L2 selves along similar lines. This is a temptation that must be resisted however, as recognising the interconnectedness of the all strands is necessary to avoid allowing the L2 self motivational system to become “yet another dichotomous, reductionist model of language learning motivation” (Ryan, 2009: 121).



So the outcome of a study which set out to falsify a theory (integrativeness is the dominant variable in predicting motivated behaviour) actually lent support to it. Surely, following Popper, we should expect this failed attempt to falsify a theory to have the positive effect of adding to its strength, no? Well no. As we’ve seen, what actually happened (and this is pretty typical of what happens in theory construction) was that the authors of the study decided that Gardner and Lambert had got it all (well, mostly) wrong anyway. While the original theory based itself on external factors such as target language speakers and communities, the L2 self motivational system turned the theory on its head (another favourite term in the history of theory development!) and focused instead on learners’ internal hopes and aspirations. Job done! Dornyei and associates, noteably Ushioda, have gone on to develop their theory and are now considered the leading lights in explaining how motivation affects SLA.

Poor old Gardner, eh? Theory construction in SLA is a tough world, make no mistake, and justice has a small part to play in its rough and tumble. It leads many, particularly those of a relativist bent, to confuse the sociology of science with progress in understanding the matters under investigation. But in this case, I wonder how much progress has been made. Dörnyei has certainly made some progress in pinning down the notoriously difficult construct of motivation, and perhaps his work can be incorporated into an eventual overarching theory of SLA, though I doubt it. But in rejecting Gardner and Lambart’s attempts to see motivation from a social psychological perspective, much has been lost. There are, after all, limits to the realms of science, and maybe a more political perspective can throw more light on things. If there is evidence of integrative motivation to learn English in places like Hungary and Japan, maybe it can be better explained by globalisation. To quote a post from Torn Horns  “the mania for English” in these countries “is not due to the fact that English is the language of things like the internet, academia and air traffic control; rather, it is due to the political decision to open up to the world market. The demand for English is a product of the demand for wealth.”


Csizér, K. & Dörnyei, Z. (2005) Language Learners’ Motivational Profiles and their  Motivated Learning Behaviour. Language Learning, 55/4: 613 – 659.

Csizér, K. & Kormos, J. (2009) Learning Experiences, Selves and Motivated Learning Behaviour: A Comparative Analysis of Structural Models for Hungarian Secondary and University Learners of English, Chapter 5, 67 – 98, in Dörnyei, Z & Ushioda, E. (eds.) (2009) Motivation, Language Identity and the L2 Self. Bristol: Multilingual Matters

Croft, S. (2014) Fostering Communicative Incompetence: A look at the role of entrance exams in reducing motivation to develop communicative competence at the pre tertiary level in Japan. Unpublished MA dissertation, University of Leicester.

Dörnyei, Z. (2005) The Psychology of the Language Learner: Individual Differences in  Second Language Acquisition, Mahwah, NJ: Lawrence Erlbaum

Dörnyei, Z. & Csizér, K. (2002) Some Dynamics of Language Aptitudes and Motivation: Results of a Longitudinal Nationwide Survey. Applied Linguistics, 23: 421 – 462.

Gardner, R. C & Lambert, W.E. (1972) Attitudes and Motivation in Second Language Learning, Rowley, MA: Newbury House.

Lamb, M. (2004) Integrative Motivation in a Globalizing World. System, 32: 3-19.

Ryan, S. (2009) The Ideal L2 selves of Japanese Learners of English. Ph.D. Thesis, University of Nottingham. Retrieved on May 29, 2011 from http://etheses.

Ushioda, E. & Dörnyei, Z (2009) Motivation, Language Identities and the Ideal L2 Self: a Theoretical Overview, Chapter 1, 1 – 9, in Dörnyei, Z & Ushioda, E. (eds.) (2009) Motivation, Language Identity and the L2 Self. Bristol: Multilingual Matters


The “L” in SLA

tempts to explain SLA focus very much on how people learn a second language, and theories of SLA thus offer a classic causal explanation of the process, in terms of factors in the environment, or social interaction, or mental processes, for example. But many SLA scholars, noteably Kevin Gregg, insist that before attempting to explain how people learn a second language, we must establish what is learned; and in order to answer this question, a linguistic theory is required. As White (1996: 85) puts it:

a theory of language acquisition depends on a theory of language.  We cannot decide how something is acquired without having an idea of what that something is.

White is a committed nativist, so she thinks a theory of SLA should explain not the behaviour of speakers but rather the mental system of knowledge underlying that behaviour; after all, she says, people don’t acquire utterances, they acquire knowledge. So, to explain what this knowledge is (how, that is, L2 competence is instantiated in the mind), we need a “property theory” which describes what the knowledge consists of.  Various answers have been suggested

in the form of connectionist nodes, or in the form of general knowledge representations, or in the form of rules of discourse, Gricean maxims, or in the form of UG (Gregg, 1993: 279).

To explain how L2 competence is acquired, on the other hand, needs a “transition theory” which narrates how the mind changes from a state of not knowing X to a state of knowing X (where X can be any part of what is necessary for L2 competence).  Thus, a satisfactory theory of SLA must describe the L2-related interlanguages (IL grammars) and other aspects of the L2 competence finally attained by learners, and also explain how learners acquired them.

As I mentioned above, not much attention is paid to the question of what is acquired, especially by those working on what can be termed processing approaches to SLA, and I’ve tended to go along with the view that there are more important matters to worry about. In my book (Jordan, 2004) I wrote:

To the extent that we have no clear answer to the question of what L2 competence is, we might be said to be working in the dark. Of course, it would be good to have “more light”, but, unlike Gregg, I do not consider the lack of it to be in any way a fatal weakness in SLA theory construction to date.  There are …. good reasons why SLA should concentrate on the process of SLA….. We should not ignore the question of L2 competence,  but we should not be blinded by it, or persuaded by Gregg that both the methods and the focus of SLA research should faithfully follow the UG approach.

In the history of science there are many examples of theories that started off without any adequate description of what is being explained, although sooner or later, this limitation must be addressed.  An example that comes to mind is Darwin’s theory of evolution by natural selection, according to which the young born to any species compete for survival, and those young that survive to reproduce tend to embody favourable natural variations which are passed on by heredity. Darwin’s concept of variation lacked any formal description of variations, or any explanation of the origin of variations or how they were passed between generations. While he recognised that his description and explanation of heredity were limited, Darwin insisted that as long as inherited variation does occur, his theory would work. It was not until Mendel’s theories and the birth of modern genetics in the early 20th century that this deficiency started to be dealt with.

I’m now not so sure that the Darwin analogy is a good one (the theory of natural selection didn’t have rival theories which contradicted it), nor am I so sure that we can just park the question of what the “L” in SLA refers to. So here, I’m just unpacking the cupboard that has all this unaired stuff in it to see what’s there.

Components of Language Competence

Gregg and White, among others, think that UG is the best candidate to provide the framework for describing the IL grammar, but I suggest that while there is no serious rival to UG and the Language Acquisition Device as an explanation for how children acquire their knowledge of the L1 grammar, UG is of little use in describing the knowledge and skills involved in SLA. Let’s take a look.

Chomsky’s model of language distinguishes between competence and performance, between the description of underlying knowledge, and the use of language, influenced as the latter is by limits in the availability of computational resources, stress, tiredness, alcohol, etc.  Chomsky says he’s concerned with “the rules that specify the well-formed strings of minimal syntactically functioning units” and with

an ideal speaker-listener, in a completely homogenous speech-community, who knows his language perfectly and is unaffected by such grammatically irrelevant conditions as memory limitations, distractions, shifts of attention and interest, and errors (random or characteristic) in applying his knowledge of the language in actual performance  (Chomsky, 1965: 3).

The underlying knowledge of language that’s acquired is referred to as “I-Language” as distinct from “E-Language”, which is performance data of the sort you get from a corpus of real texts. “I-Language” obeys rules of Universal Grammar, among which are structure dependency; subjacency (which constrains the movement of categories); C-command and government theory (which constrain a number of the subsystems, such as case theory); and binding theory (which constrains the formation of NPs). So,

UG consists of a highly structured and restrictive system of principles with certain open parameters to be fixed by experience.  As these parameters are fixed, a grammar is determined, what we may call a `core grammar’  (Chomsky 1980, cited in Epstein, Flynn and Martohardjono, 1996: 678).

The principles are universal properties of syntax which constrain learners’ grammars, while parameters account for cross-linguistic syntactic variation, and parameter setting leads to the construction of a core grammar where all relevant UG principles are instantiated. Chomsky’s attempts to pin down the essential rules of language require this key distinction between competence and performance, and it’s important to be clear that performance refers to the actual utterances, spoken and written, of language users in their day to day communication. Such data, while doubtless of great interest to those investigating other areas of linguistics, is irrelevant to the development of UG theory, which is a property theory: it attempts to describe the essential rules of syntax governing all languages. It also provides an elegant, hugely persuasive explanation of how children acquire their L1,  but that’s a different story. Let’s continue now with competence.

Hymes (1972) criticised the Chomskian account of competence as too limited and argued that knowledge of the appropriacy of language use was also important.  Canale and Swain (1980) described communicative competence in terms of three components, and Canale (1983) proposed four components: linguistic, sociolinguistic, discourse, and strategic. Bachman (1990) also proposed four components but omitted strategic competence because, he argued, language competence consists of knowledge of and about the language, while strategic competence (the general cognitive skills involved in language use) are better understood as an ability, or capacity, rather than knowledge. For the moment, we may simply note the kinds of knowledge that are regarded as part of language competence, taken from Bachman and Palmer’s 1996 book. It’s interesting that Bachman’s 1990 book has the same diagram as the one below, but in the 1990 version, all components are labelled “Competence” (Organisational Competence, Grammatical Competence, etc.).



The  model reflects the growing opinion that Chomskian competence is not the best bedrock for a framework for examining SLA.  A description of what constitutes competence in an L2 is very different to a description of the modular knowledge that Chomsky gives for L1 acquisition and so the question “What is acquired in SLA?”, while arguably requiring a property theory, certainly can’t make much use of Chomsky’s narrowly-defined linguistic competence. But what do we make of Bachman’s model? After all, Bachman’s objective in identifying the various types of knowledge or competencies outlined here is to construct adequate tests of L2 learners’ proficiency; his re-organising and re-defining of the terms used previously by Hymes, and Canale and Swain is motivated by a desire to make the terms more testable. But in so doing, Bachman is, at least implicitly, saying that, pace Chomsky,  measures of a learner’s performance, far from being irrelevant, are a good reflection of his or her competencies.

Which brings us back to strategic competence. While Canale saw it as  performing a compensatory role (to be used to repair gaps in knowledge) Bachman gives it a central role, namely: mediating between meaning intentions, underlying competencies, background knowledge, and the context of the situation. It does this by determining communicative goals, assessing communicative resources, planning communication, and then executing the plan.

This, as Skehan (1995) argues, is a model of performance. Skehan argues that by considering strategic competence as not just compensatory but central to all communication

the nature of the relationship between competence and performance is being redefined, since Bachman is proposing a dynamic for communication.  He sees this relationship as being mediated through the operation of a pervasive strategic competence (Skehan, 1995: 93-94).

Skehan concludes that in SLA it’s misconceived to see competence as underlying performance in any straightforward way: psychological mechanisms are key (but are they parts of competence?); formulaic language, everybody seems to agree is not really a competence (but why not?); and planning (not a competence) helps draw on form (knowledge of which is most certainly a competence). So what Skehan is obviously challenging here is both the competence /performance dichotomy, and the knowledge/ skill dichotomy too. Skehan concentrates on the question Is strategic competence part of L2 competence? He points out that although awareness of how to cope can be seen as competence, behaviour during communication is clearly in the realm of performance. The answer to this problem, Skehan proposes, is to see strategic competence as the operation of processes which constitute “ability for use”.

Ability for use, in other words, is what goes beyond Bachman’s (1990) assessment, goal-setting, planning, and execution and is what accounts for the balance between analysability and accessibility as the processing dimension of actual communication  (Skehan, 1995: 106).

Well I give this quote, but it’s hopelessly out of context. The reference to analysability and accessibility is to Widdowson, and you need to know that Skehan’s article appeared in a festschrift to Widdowson, and that when Widdowson talks about these two terms he too is questioning the competence versus performance distinction  But anyway, we may take from all this that Skehan’s “ability for use” construct is one alternative to the usual distinction made between competence and performance.


Another way to look at the problem of competence is to return to the question of language proficiency, as Bialystok (2001) does.

What is the norm for language competence?  What do we mean by language proficiency?  What are its components and what is the range of acceptable variation?  Although these questions may seem to be prior to any use of language as a research instrument or conclusion about language ability in individuals, they rarely if ever are explicitly addressed  (Bialystok, 2001: 11).

Bialystok doesn’t underestimate the difficulties of measuring language proficiency, and she does no more than “point to approaches that may eventually provide a fruitful resolution”, but her book serves to once again call into question the competence / performance dichotomy. Bachman’s work does the same thing since he’s talking exclusively about performance.

I think this is the way to go. If we can get a handle on proficiency, go beyond the very limited frameworks  offered so far by various English (e.g. Cambridge) and international (e.g. the increasingly questioned European Common Framework) bodies, we may have a really useful construct to work with. I’m rather surprised at the lack of research done on Bachman and Palmer’s model.

Formal versus Functional Grammars

In trying to sort out the confusion caused by different takes on the  competence / performance issue, we can also consider the arguments among those that adopt formal and functional approaches to linguistic theory. As Bialystock (2001: 14) says:

We need to establish fixed criteria that supersede the theoretical squabbles and point to critical landmarks in language mastery.  These are lofty goals, but without some framework for evaluating progress it is impossible to produce meaningful descriptions of the acquisition of language.

Bialystock points out that functionalists limit themselves to the claim that language is in the environment, and cite computer simulations, such as connectionist modelling, as evidence of the sufficiency of their explanation.

But what is language, why is it structured as it is, and why are all languages so similar? The functionalist approach treats language as though it were like yogurt: once some exists, it is fairly straightforward to reproduce it, but where did the first yogurt come from? And why does yogurt from different places always come out more or less the same? To make yogurt, one must start with yogurt. There is something essential about its nature. So too with language: once it is in the environment, there are a number of ways one can explain how individual children obtain their own copy, but how did languages develop the predictable regularities they did, especially when the same regularities are observed across highly disparate languages? And why does the path to acquisition always look so similar? The functionalist response is to deny they are dealing with yogurt: the idea of linguistic universals is a fiction and each language is as different from all others as is each child who learns it (Bialastok 2001: 51).

This eloquent defence of formal grammar should not be interpreted as unconditional support: Bialystok not only berates the functionalists for their refusal to accept the idea of linguistic universals; she also admonishes the formalists, whose theories she describes as “equally parochial”.  While in 2004 I considered that the “theoretical squabbles” were of minor importance, and that the sensible thing was to agree that both the underlying rules of grammar and the descriptions provided by functionalists of how we use language for different purposes are important elements of our knowledge of language, I’m now a lot  more concerned about the worrying issues bubbling under the surface here, if you’ll forgive the expression.

Just one last element in the growing conundrum needs mentioning: connectionism, known these days as emergentism.


The “emergentist” approach to SLA is becoming very fashionable these days, notwithstanding (or maybe partly due to) the hopelessly-mangled attempts by Larson-Freeman to promote it. Ellis (2002) explains that emergentists “believe that the complexity of language emerges from relatively simple developmental processes being exposed to a massive and complex environment.”  Emergentists reject the UG account of language, and the nativist assumption that human beings are born with linguistic knowledge and a special language learning mechanism. Ellis shows how language processing is “intimately tuned to input frequency”, and expounds a “usage-based” theory which holds that “acquisition of language is exemplar based”. (Ellis, 2002: 143) The power law of practice is taken by Ellis as the underpinning for his frequency-based account, which argues that “a huge collection of memories of previously experienced utterances” rather than knowledge of abstract rules, is what underlies the fluent use of language. In short, emergentists take most language learning to be “the gradual strengthening of associations between co-occurring elements of the language”, and they see fluent language performance as “the exploitation of this probabilistic knowledge” (Ellis, 2002: 173).

Seidenberg and MacDonald (1999) suggest that connectionism provides an alternative framework to “the generative paradigm”. In place of equating knowing a language with knowing a grammar, the probabilistic constraints approach adopts the functionalist assumption that language knowledge is “something that develops in the course of learning how to perform the primary communicative tasks of comprehension and production.” (Seidenberg and MacDonald, 1999: 571) This knowledge is viewed as a neural network that maps between forms and meanings, and further levels of linguistic representation, such as syntax and morphology, are said to emerge in the course of learning tasks. An alternative to “Competence” is also offered by Seidenberg and Macdonald, who argue that the competence-performance distinction excludes information about statistical and probabilistic aspects of language, and that these aspects play an important role in acquisition. The alternative is to characterize a performance system that handles all and only those structures that people can. Performance constraints are embodied in the system responsible for producing and comprehending utterances, not extrinsic to it. This approach obviates the paradox created by a characterization of linguistic knowledge that generates sentences that people neither produce nor comprehend (Seidenberg and MacDonald, 1999: 573). I’ve written about all this in my book (Jordan, 2004) and there’s an extract from it on the blog under the title Emergentism.

What seems to have happened is that Chomsky’s “competence” construct got mixed up in subsequent attempts to talk about the “L” in SLA; and, slowly but not at all surely, a distinction is made between knowledge and skills, so that language knowledge is seen to be interacting with the other non-linguistic factors. In particular, strategic competence (a non-linguistic general “ability” that enables an individual to use available resources by regulating online cognitive processes in accomplishing a communicative goal)  is separated from language competence. But this hasn’t, alas, resulted in all those working on a theory of SLA having a clear picture of what is acquired. Bachman’s description of language competence, while designed to measure language use, indicates the kinds of knowledge and skills that are involved in developing IL systems, and Bialystok has suggested how the search for proficiency measurements might contribute to clarifying the matter. But it’s the differences between formalists and functionalists, with the added ingredient of emergentism weighing in on the functionalist side, that now strike me as demanding closer attention and evaluation. Maybe for the practical purposes of ELT we can get along quite well using a combination of grammar rules and exemplars, particularly lexical chunks, but I think that for those interested in theory construction, major concerns involving such weighty matters as mind versus brain and rationalism versus empiricism need to be resolved. Both nativist (UG) and empiricist (connectionist) theories of SLA actually provide both a property and a transition theory of SLA. I think they’re both wrong, but I recognise that my own view – based on a rather bolted together and incomplete cognitive transition theory – offers nothing very substantial to put in their place. A satisfactory description of the “L” in SLA would certainly help.


Bachman, L. (1990) Fundamental Considerations in Language Testing.  Oxford: Oxford University Press.

Bachman, L. and Palmer, A.S. (1996) Language Testing in Practice. Oxford, OUP.

Bialystok, E. (2001) Bilingualism in Development.  Cambridge: CUP.

Canale, M. (1983) On some dimensions of language proficiency.  In Oller, J. (ed.): Issues in Language Testing Research.  Rowley, M.A.: Newbury House.

Canale, M. and Swain, M. (1980) Theoretical bases of communicative approaches to second language teaching and testing. Applied Linguistics 1: 1-47.

Chomsky, N.  (1965) Aspects of the theory of syntax. Cambridge, Mass.: MIT Press.

Cook, V. J. (1993) Linguistics and Second Language Acquisition.  Basingstoke:Macmillan.

Ellis, N. (2002) Frequency Effects in Language Processing and Acquisition.  Studies in Second Language Acquisition 24,2, 143 – 187

Epstein, S., Flynn, S., and Martohardjono, G. (1996) Second Language Acquisition: Theoretical and experimental issues in contemporary research.  Behavioural and Brain Sciences.  Vol. 19, 4, 677-758.

Hymes, D. (1972) On Communicative Competence. In Pride, J. And Holmes, J. (eds.) Sociolinguistics.  Harmondsworth: Penguin.

Jordan, G. (2004) Theory Construction in SLA. Amsterdam, Benjamins.

Seidenberg, M.S. and MacDonald, M.C. (1999) A probabilistic constraints approach to language acquisition and processing.  Cognitive Science, 23, 569-588.

Skehan, P. (1995) Analysability, accessibility, and ability for use.  In Cook, G. and Seidlhoffer. B. (eds.): Principles and Practice in Applied Linguistics.  Oxford: Oxford University Press.

4 Questions about TBLT

In response to my previous posts about TBLT, Russ Mayes asked some interesting questions, and I’ll try to answer them here.

Question 1
To start at the end, Russ says in regard to interlanguage:

If it’s not innate but also consistent regardless of the L1 or target language, what on earth could be the mechanism leading to this?

Good question! No really, it is a good question. Interlanguage as a theoretical construct refers to a systematic, natural language, so learners are constrained in the development of interlanguages by the same principles that constrain the development of any human language. Those researchers who argue that UG is fully accessed in SLA say these constraints are due to a human language faculty (what Chomsky calls the L.A.D). But others (and I place myself in this other camp) claim that general cognition principles which explain how we process and learn any other kind of information, can explain how L2 learners develop their interlanguages. The UG view of interlanguage development is powerful, because it has the backing of an excellent theory, but, since I ‘m not persuaded that UG applies to SLA, I have to reply to Russ’ question “What’s the mechanism?” by saying: general cognitive processing. “Thin soup”, I can hear Russ grumble, so, in a desperate attempt to thicken it a bit, we must go back to the story. What follows is mostly taken from Ortega (2009).

Let me summarise the 3 important features of interlanguages. First, input can’t explain it. Oshita (2000) quotes from an L2 English essay written by an L1 Spanish speaker: “It [a wall] was falled down in order to get a bigger greenhouse”. The regular past tense ending -ed has been added to an irregular intransitive verb. The learner didn’t pick this up from input.

Second, if we hear “How I do this?” from an L1 Spanish learner or an L1 Punjabi learner, whose languages do not have inversion, we may say that the L1 is inducing the choice. But, as Ortega says, “if we sampled learners from a wide enough range of L1 backgrounds, including languages where inversion does exist (e.g., Dutch and German), we would find that they, too, use un-inverted questions in their English interlanguage at an early stage of L2 development”. The evidence suggests that the L1 cannot be the correct explanation for lack of inversion, and it turns out that “How I do this?” results from what Ortega calls “a universally attested interlanguage solution to the problem of question formation in English”, namely fronting.

Third, many interlanguage solutions are also attested in the production of children acquiring their first language. “How”, Ortega asks, “can we explain interlanguage solutions that are neither directly attributable to the input nor to the L1, and that are shared by first and second language acquirers? The unavoidable conclusion is that these forms are interim systematic innovations that learners independently create when they are trying to figure out the workings of the new language system they are learning. …. Interlanguages develop due to the interaction of multiple forces including input, knowledge of the L1, and the interaction between the universal shape of languages and the conceptual apparatus of the human mind. These include syntactic, semantic-discoursal and statistical, as well as conceptual and sensorimotor, processing influences on the one hand, and communicative pressures and social incentives learners experience as they use the language to make meaning on the other”.

Interlanguage Sequences

In order to illustrate L2 sequences, Ortega examines findings for five interlanguage domains. These are:

1. Morpheme orders
2. Form-function mappings
3. Developmental stages of negation
4. Developmental stages of word order and questions
5. Hierarchical acquisition of relative clauses.

In all 5 domains evidence from a number of studies strongly indicates a sequence of learning which is unaffected by learner age, L1, acquisition context, or instructional approach.


Having spent some considerable time discussing research on L2 sequences, Ortega turns to a discussion of processes, which are “the manifestation of putative mechanisms by which learners develop (or fail to develop) their internal grammars”. She focuses on four processes: simplification; overgeneralization; restructuring and U-shaped behaviour; and fossilization.

Simplification reflects a strategy that is called upon when messages must be conveyed with little language. Simplification is seen during very early stages of L2 development and also later on, when complex syntax and some morphology emerge. So, for example, even though a full range of formal choices is available in the morphology of the target language, a base (invariant) form tends to be chosen by learners at first; and even though multiple form-meaning mappings exist in the target language, a one-meaning-one-form mapping is initially represented in the learner grammar.

Overgeneralization involves the application of a form or rule not only to contexts where it applies in the target language, but also to others where it does not apply. Ortega gives the case of systematic overgeneralization in morphology where an attempt is made to make irregular forms fit regular patterns, as seen in the example from Oshita, 2000, cited above.

Restructuring is the process of self-reorganization of grammar knowledge representations. During periods when restructuring of internal representations is happening, learners may seem to “backslide” and produce “errors” they did not seem to produce earlier, producing a pattern known as U-shaped behavior. Sharwood Smith and Kellerman (1989) define it as “the appearance of correct, or nativelike, forms at an early stage of development which then undergo a process of attrition, only to be re-established at a later stage” .

Simplification, overgeneralization, restructuring, and other fundamental processes help learners move along the sequences. But there is no guarantee that the outcomes of these processes will keep propelling all learners toward convergence with the target system. Despite apparently favourable conditions for learning, many L2 users may stop anywhere along a given sequence of development, perhaps permanently. The term fossilization was coined by Selinker (1972) to refer to such cases of “premature cessation of development in defiance of optimal learning conditions” (Han, 2004, cited in Ortega, 2009).

I’ve omitted all the examples Ortega gives of sequences and processes, and strongly advise all those interested in interlanguage development to read the chapter in its entirety. I should say here that IMHO, Long and Doughty’s 2009 Handbook of Language Teaching contains a superb collection of papers; it’s the best book in the field of applied linguistics published in the last 10 years.

Question 2
I know Russ well enough by now not to say anything rash like “So, there we have it”; I’ve simply outlined a picture which needs filling in and even then, it’s hardly a shining portrait of the obvious truth. But I think it’s enough to answer Russ’ questions, so let’s deal with the rest of them. Russ asks for clarification about the claim that instruction can’t affect the route of interlanguage development. Russ says:

the claim only applies to L2 grammar development, right? You could learn as many words, phrases etc as you wanted. Also, the scope of what is defined as ‘grammar’ is not things like ‘you should play tennis’, this would be classed as vocabulary.

The claim extends to the 5 domains listed above, which, of course includes morphology and the development of the accurate use of phrases and lexical chunks. But I should bring to light (because if I don’t, Russ will) studies that show that explicit instruction in a particular structure can produce measurable learning: Long mentions several, but points out that the studies involved devoting far more extensive periods of time to intensive practice of the targeted feature than is usually available and that “once the teaching focus shifts to new linguistic targets, learners revert to an earlier stage on the normal path of acquisition of the structure which they had supposedly mastered in isolation “ahead of schedule”” (Long, 2015, p. 22) .

Question 3
Another issue Russ brings up concerns learning relative clauses. Russ says:

The evidence shows that Arabic speakers and Chinese speakers get roughly equal scores on tests of relative clauses. But closer inspection shows that Chinese students used them far less. Chinese doesn’t have relative clauses and Arabic does, so it could be supposed that when Chinese students used them they were more careful.

Quite right, and this supposition was taken into account in the study. I think Russ’ general point is that in studies of SLA (or anything else for that matter) we have to be very careful in saying what the evidence we gather is evidence of. I’ve said elsewhere that evidence (data) should be in the service of some theory or hypothesis that attempts to explain a phenomenon, and that lots of raw data really doesn’t get us anywhere. Data offered in support of the theory of interlanguage development (a theory which is far from complete or free from controversy) needs to be very carefully scrutinised.

Question 4
A fourth question Russ asks is:

How do these developmental stages occur (a) if the ‘stage’ doesn’t actually exist in the target language; (b) when the L1 and L2 both have the target feature constructed in the same way. An example might be question formation in English versus Japanese. A Chinese student learning Japanese only has to learn that ‘ma’ is ‘ka’ in Japanese, but in English has to learn the entire convoluted ‘do support’ system (which only 2 other languages possess).

If the ‘stage’ doesn’t actually exist in the target language then it is skipped; but, as was briefly mentioned above, whether the L1 and L2 both have the target feature constructed in the same way or not, learners seem to go through the same stages in the development of “the entire convoluted ‘do support’ system”. The emergence of questions in L2 English has been traced by many, including Pienemann, Johnston, & Brindley (1988, cited in Ortega, 2009).

Stage 1: Words and fragments with rising intonation. E.g.: One astronaut outside the space ship? A ball or a shoe?

Stage 2: Canonical word order with rising intonation. E.g.: He have two house in the front? Two children ride a bicycle?

Stage 3: Fronting of a questioning element (wh-word, do something else). E.g.: Where the little children are? What the boy is throwing?

Stage 4: Inversion in two restricted contexts: (1) in wh-questions with copula, (2) in yes/no questions with auxiliaries other than do. E.g.: Where is the sun? Where is the space ship? The ball is it in the grass or in the sky?

Stage 5: Inversion expands to the full range of target like contexts . E.g.: How many astronauts do you have? What is the boy throwing?

Stage 6: Negative questions; Question tags; Questions in embedded clauses. E.g.: Doesn’t your wife speak English? You live here, don’t you? Can you tell me where the station is?

Just by the way, the pattern in L2 development shown here is one of gradual approximation to the target system, and learners “outgrow” each stage as they develop. This is not the only pattern as illustrated by the pattern uncovered by Meisel, Clahsen, and Pienemann (1981, cited in Ortega, 2009) for word order in L2 German. Unlike the emergence of questions (or the negation sequence which Ortega also describes), where the learners gradually outgrows each stage, the word order stages are cumulative. This means that each stage adds an important piece to the increasingly more complete repertoire of syntactic options, until the interlanguage system matches the full complexity of the repertoire available in the target grammar.

As a final twist, an unwary novice teacher might jump to the naïve conclusion that the sequences and processes of L2 development discovered by SLA research should form the basis for a syllabus aimed at classroom instruction. Well, no. First, most aspects of the grammar of any target language are not covered by the research. Second, we don’t know how even the different sequences we’ve uncovered relate to each other in the grammar of individual learners, so textbook writers and curriculum developers have little guidance as to how to sequence grammatical targets according to developmental learner readiness principles. Third, learning syntax and morphology is only part of the task; learning vocabulary, pragmatics, phonology, and so on is also involved, and although a lot is known about how these areas are learned by L2 users, it’s not obvious how all this knowledge can be used to design a syllabus. Most importantly, as I hope I’ve explained in previous posts, organising classroom teaching around grammar in a product or synthetic syllabus is less effective than options more attuned to what we know about psycholinguistic, cognitive, and socioeducational principles for good language teaching.

Ortega finishes her chapter on an upbeat note. “Nevertheless, knowledge about the sequences and processes of interlanguage development can inform good teaching by helping teachers (and their students) cultivate a different attitude toward “errors,” and more enlightened expectations for “progress.” It can help them recognize that many so-called errors are a healthy sign of learning, that timing is hugely important in language teaching, and that not all that can be logically taught can be learned if learners are not developmentally ready. Knowledge about sequences and processes can also help counter the deficit view that interlanguages are defective surrogates of the target language by making it clear that interlanguages are shaped by the same systematicity and variability that shape all other forms of human language.”

Long, M and Doughty, C. (2009) Handbook of Language Teaching. Oxford, Wiley.
Ortega, L. (2009) Sequences and Processes in Language Learning. In Long and Doughty Handbook of Language Teaching. Oxford, Wiley.