If I’d actually drunk a bottle of tequila while trying to understand Schmidt’s Noticing Hypothesis last Tuesday, I would have woken up with a hangover, and these days the hangovers are so bad that I just can’t face them. So when I woke up the morning after, all was well; my surroundings were familiar, my wife was with me, there was nothing to make amends for. Reassuring, of course, but I confess to feeling nostalgia for my younger days. There’s nothing quite like the fun you have drinking; the Devil has all the best songs, they say, and I bet Hades had all the best cocktails. Easy to imagine getting the ferry across the Acheron, sitting around the lounge bar waiting to see where you were going (probably not to the Elysian fields!), banging back dry martinis with funny people like W.C. Fields (“I cook with wine. Sometimes I even add it to the food”, and Tommy Cooper (“I’m on a whisky diet. I’ve lost three days already), grateful that you’d never been a mere sober mortal.
Downstairs, I made a nice big mug of tea and took it to the study. There on the desk and on the monitor was all this stuff about the Noticing hypothesis. Not just Schmidt versus Truscott, and Gregg versus Krashen, and all the other SLA feuds, but also the famous Locke versus Leibniz debate and the equally famous Aristotle versus Plato debate about more or less the same thing. Aristotle wasn’t quite an empiricist, but certainly got the better of Plato on epistemology, while Leibniz is generally regarded as coming out on top against Locke. Specially the Leibniz-Locke debate still seems relevant today in the light of the latest challenge to nativist views on language learning, and I think Leibniz might have had some harsh words to say about the blurred lines between awareness, atttention and consciousness in Schmidt’s attempts to develop the Noticing Hypothesis.
Just to reassure those who might be unduly swayed by the likes of Penny Ur (and Scott Thornbury on a bad day) into thinking that they shouldn’t worry their heads with all all this theoretical stuff (just trust your instincts and polish your presentation skills), my motivation for sniffing around this particular theoretical stuff is to check on the foundations of our teaching. It’s a terrible job, the pay’s lousy, but somebody’s got to do it, right? Somebody’s got to check, that is, to see whether ‘noticing’ justifies all the explicit teaching done in its name. I suspect that the influential teacher trainers who rely on ‘noticing’ to justify their encouragement of everything from teaching a grammar-based syllabus to teaching as many lexical chunks as you can cram into a 90 minute class are talking baloney, and it should be made clear that their advice gets no support from any good research. On the face of it ‘noticing’ encourages bad teaching practice, and so needs to be carefully examined.
So here we go with Part 2. I left Part 1 face down on the carpet, exhausted by unsuccessful efforts to understand the Noticing Hypothesis. In the comments that followed, one particular problem was highlighted by Kevin Gregg, who said:
You can’t notice what is not in the input; and rules, for instance, or functions, are not in the input.
This prompted Thom to ask:
In what other way can anybody learn grammar if it is not by way of input?
Kevin’s on-going tussle with time (trains to catch, letters to write, shopping to do) prevented him from replying, so I’ll try.
Well it depends where you’re coming from, as they say. Empiricists, or rather, “‘empiricist’ emergentists” as Gregg calls them would say that input is the sufficient condition for learning an L2, and they’d probably caution against listening to any talk of mental grammars. Empiricists like Nick Ellis see all knowledge as coming from the information we get through our senses during our interaction with the environment, and with reference to language learning, the emergentists argue that we aren’t born with linguistic knowledge of any sort because we don’t need it. General learning devices (capable of making generalisations based on exemplars found in the input, for example) are all we need. In Nick Ellis’ words:
massively parallel systems of artificial neurons use simple learning processes to statistically abstract information from masses of input data. What evidence is there in the input stream from which simple learning mechanisms might abstract generalizations? The Saussurean linguistic sign as a set of mappings between phonological forms and conceptual meanings or communicative intentions gives a starting point. Learning to understand a language involves parsing the speech stream into chunks which reliably mark meaning.
… in the first instance, important aspects of language learning must concern the learning of phonological forms and the analysis of phonological sequences: the categorical units of speech perception, their particular sequences in particular words and their general sequential probabilities in the language….
In this view, phonology, lexis and syntax develop hierarchically by repeated cycles of differentiation and integration of chunks of sequences.
On the other hand, nativists like Kevin Gregg, specially those who accept Chomsky’s principles and parameters UG theory, point to the knowledge young children have of language to argue that SLA is the result of an innate representational system specific to the language faculty acting on input in such a way that an L2 grammar is created. We are born with knowledge of various linguistic rules, constraints and principles. In interaction with the environment, which exposes us to ‘primary linguistic data’, we acquire a new, expanded body of linguistic knowledge, namely, knowledge of a specific language like English. This final state of the language faculty constitutes our ‘linguistic competence’, essential, but not sufficient for our ability to speak and understand a language. Additional knowledge about actual language use is acquired through other general learning mechanisms.
Whatever view we take of the SLA process, the question of how it starts (input) is obviously critical, but re-visiting Schmidt’s Noticing Hypothesis has led me to appreciate that the question of how it ends up is equally important. What finally gets acquired? To answer this question we need what Gregg calls a “property” theory of SLA – a theory of language, or, more precisely, of linguistic knowledge of the L2. What is the knowledge that is acquired when someone learns a second language? O’Grady (2005) notes that while the UG camp talk about problems sorting out categories and structures, the emergentists talk about sorting out words and their meanings, and this leads him to suggest that the disagreement about how we learn an L2 stems from a deeper disagreement about “the nature of language itself”. O’Grady (2005, p. 164) explains:
On the one hand, there are linguists who see language as a highly complex formal system that is best described by abstract rules that have no counterparts in other areas of cognition. …. Not surprisingly, there is a strong tendency for these researchers to favor the view that the acquisition device is designed specifically for language. On the other hand, there are many linguists who think that language has to be understood in terms of its communicative function. According to these researchers, strategies that facilitate communication – not abstract formal rules – determine how language works. Because communication involves many different types of considerations … this perspective tends to be associated with a bias toward a multipurpose acquisition device.
This excellent comment is echoed by Susanne Carroll (2001, p. 47), who distinguishes between
- Classical structural theories of information processing which claim that mental processes are sensitive to structural distinctions encoded in mental representations. Input is a mental representation which has structure.
- Classical connectionist approaches to linguistic cognition which deny the relevance of structural representations to linguistic cognition. For them, linguistic knowledge is encoded as activated neural nets and is only linked to acoustic events by association.
Anyone who is convinced that the last 100 years of linguistic research demonstrate that linguistic cognition is structure dependent — and not merely patterned— cannot adopt a classical connectionist approach to SLA.
O’Grady’s and Carroll’s remark remind me that the majority of scholars who are currently looking closely at how input ends up as knowledge don’t articulate a coherent answer to the crucial question: “What is the linguistic knowledge that is acquired?”. Many years ago, I myself made some effort to kick this question into the long grass. Gregg’s repeated insistence on the need for a property theory of SLA which describes what is acquired, prompted me to say in a book and in an article for Applied Linguistics that researchers could perfectly well get on with developing a theory of SLA without worrying about the damn property theory. In a short reply (I think he had a bus to catch that time), Gregg effortlessly dealt with my bleatings (the bus and, I like to think, our friendship saved me from the full Gregg treatment) and I’m now fully persuaded that he’s right to demand a property theory.
I think it’s the absence of a well-articulated property theory that makes it so difficult for Schmidt and others to explain how information from the environment ends up as linguistic knowledge of the L2. They accept that the knowledge acquired includes linguistic knowledge of, for example, the structure of an English verb phrase, and they insist that learning this knowledge depends on ‘noticing’ things in the input” But how, we must ask again, does ‘noticing’ audio stimuli from the environment lead to the acquisition of the linguistic knowledge demonstrated by proficient L2 users? Let’s take a quick look at the history of SLA research.
The shift from a behaviouristic to a mentalist view of language learning (sparked by Chomsky’s rebuttal of Skinner in 1957) prompted scholars in the field of psycholinguistics to see language learning as a process which goes on inside the brain and involves the workings of some kind of acquisition device. The, as yet unobservable, “black box” that we can refer to as an acquisition device is almost certainly not located in one particular part of the brain, might or might not be dedicated exclusively to language learning, might or might not make use of innate linguistic knowledge, but certainly does (somehow) enable us to receive, organise, store and retrieve, and manipulate ‘input’ so as to facilitate learning the L2.
And there it is: ‘input’. The Merriam-Webster dictionary says that the term was first used in 1953, in the context of computer design, to refer to data sent to a computer for processing. In the study of SLA, Corder (1967) was the first to suggest that we acquire the rules of language in a predictable way, and that the order is independent of the order in which rules are taught in language classes. This led Corder to suggest that there was a difference between input and intake.
The simple fact of presenting a certain linguistic form to a learner in the classroom does not necessarily qualify it for the status of input, for the reason that input is ‘what goes in’ not what is available for going in, and we may reasonably suppose that it is the learner who controls this input, or more properly his intake. This may well be determined by the characteristics of his language acquisition mechanism. (p. 165).
Here, input is what’s available, and intake is what the learner decides to take in. It’s not clear to me what either ‘input’ or ‘intake’ refer to, and anyway, as Schmidt (1990) points out, Corder contradicts himself by saying in the first sentence that the learner controls intake, and by then saying in the second sentence that his language acquisition mechanism does. More importantly for our hunt, Schmidt goes on to say that it’s not clear whether intake is the subset of input that makes it into short term memory, or whether it’s that part of input that has been sufficiently processed to now form part of the learner’s interlanguage system. The way Schmidt expresses this second point is instructive. Schmidt says that Corder’s treatment of intake does not make any clear distinction between that part of input used to comprehend messages and that part used “for the learning of form” (Schmidt, 1990, p. 139). Schmidt also endorses Slobin’s (1985) distinction between processes involved in converting input into stored data for the construction of language, and processes used to organise stored data into linguistic systems. Schmidt is obviously aware (sorry) of the problem of clearly identifying not just the level of conscious attention /awareness involved in noticing, but also the problems of clearly defining what is noticed and what (if any) processing goes on when learners notice whatever it is they notice.
Moving on to Krashen, his input hypothesis draws on the “natural order” of L2 acquisition that Corder drew attention to, and supposes that learners progress along a pre-determined learning trajectory which is impervious to instruction and controlled by a language acquisition device. Acquisition, Krashen says, is triggered by receiving L2 input that is one step beyond their current stage of linguistic competence. If a learner is at a stage ‘i‘, then acquisition takes place when he/she is exposed to ‘Comprehensible Input’ which belongs to level ‘i + 1‘. In Krashen’s model, learners only need comprehensible input and a low affective filter to acquire the L2, because once the i+1 input is received, Chomsky’s LAD does the rest. Almost needless to say, the trouble with Krashen’s input hypothesis is that he nowhere explains what comprehensible input consists of, or tells us how to recognise it.
Unsurprisingly, Schmidt’s not very impressed with Krashen’s badly-defined hypothesis, but it’s not just the lack of definition that Schmidt objects to; crucially, Schmidt insists that SLA is triggered by conscious attention. Krashen’s comprehensible input is, says Schmidt, much better seen as intake, itself defined as that part of the input which is ‘noticed’. Because what learners actually do is consciously attend to, notice, certain parts of the input, and the noticed parts becomes intake. Furthermore, since the parts of the input which aren’t ‘noticed’ are lost, it follows that noticing is the necessary condition for learning an L2. In his 1990 paper, at least, the claim is not, as so many now want to interpret the Noticing Hypothesis, “More noticing leads to more learning”, but rather, the much stronger claim “Learning can’t take place without noticing”.
In the next post, I intend to look at processing models and try to pin down Schmidt’s “technical” definition of ‘noticing’, which he says is “equivalent” to Gass’ ‘apperception’. Hmmm. I’ll also look at Suzanne Carroll’s very different view of input. She says:
The view that input is comprehended speech is mistaken and has arisen from an uncritical examination of the implications of Krashen’s (1985) claims to this effect. …… Comprehending speech is something which happens as a consequence of a successful parse of the speech signal. Before one can successfully parse the L2, one must learn it’s grammatical properties. Krashen got it backwards!”
To be continued.
Carroll, S. (2001) Input and Evidence. Amsterdam; Benjamins.
Corder, P. (1967) The significance of learners’ errors. International Review of Applied Linguistics, 5, 161-169
Ellis, N. (1998) Emergentism, Connectionism and Language Learning. Language Learning 48:4, pp. 631–664.
O’Grady, W. (2005) How Children learn language. CUP.