Rastelli’s Discontinuity Hypothesis: a new challenge for SLA researchers


Mike Long’s review of Stefano Rastelli: Discontinuity in Second Language Acquisition. the Switch between Statistical and Grammatical Learning, Multilingual Matters, 2014, appeared recently in the Applied Linguistics journal’s on line Advance Access. Here’s a brief summary of the review. I’ve taken gross liberties cutting Long’s text, but that’s about all I’ve done: some of it appears below verbatim and the rest is as Long wrote it, but with big bits lopped off. I share Long’s view that this is an important book which deserves our attention, but additionally I personally think that it highlights the weaknesses of attempts made by Larsen-Freeman, Thornbury, Hoey and others to use a garbled version of emergentism to support their views. Rastelli’s hypothesis represents the beginnings of a research programme that could pose a real challenge to Processability Theory, which is the theory most often adopted currently in attempts to explain SLA.

Rastelli’s book is part of the growing research interest in the potential of statistical learning and usage-based accounts of SLA by adults. The general idea is that learners can detect absolute frequencies, probabilistic patterns, and co-occurrences of items in the linguistic environment, and use the resulting information to bootstrap their way into the L2. Statistical learning (SL) is a general learning theory which relies on the construct of a domain-general capacity that operates incidentally, results in implicit knowledge, and functions for all linguistic sub-systems, from phonology, through word learning, morphology and syntax, to pragmatics.

Long says that Stefano Rastelli’s book (henceforth, Discontinuity) is remarkable for 3 things:

  1. its coherence.
  2. The breadth and depth of Rastelli’s knowledge of current theory and research in linguistics, cognitive psychology, neurolinguistics and SLA, and his ability to synthesize and integrate work from all four.
  3. The originality of his perspective.

Rastelli claims that SL is the initial way learners handle combinatorial grammar, i.e., regular co-occurrence relationships between audible or visible forms  that are overt in the input and the meanings and functions of those forms. Because audible or visible and regular, the patterns are frequency driven and countable, which is what SL requires to operate. Combinatorial grammar comprises recurrent combinations of adjacent and non-adjacent whole words and morphemes. The form-function pairs can be stored and retrieved first as wholes, and then broken down into their component parts in order to be computed by abstract rules.

Combinatorial grammar is learned twice, Rastelli claims, first by SL, and then by grammatical learning (GL). This is the meaning of ‘discontinuity’ in his hypothesis. SL prepares the ground for GL: “Statistics provides the L2 grammar the ‘environment’ to grow and develop” (2014: 220). SL involves first a computation over transition probabilities and subsequently bottom-up category formation; GL is achieved through computation over symbolic abstract rules and top-down category formation. GL happens when learners recognize (implicitly) not just regularities in the ways certain words co-occur, but why they co-occur. At that point, they can move beyond statistically based patterns and induce productive combinatorial rules. They can abstract away from particular exemplars that contain regular markings for number, tense, case, etc., now understanding (implicitly, again) that these properties can be applied to new exemplars.

The shift to GL is an abrupt, qualitative change — a rupture, not simply the next stage in a single continuous developmental process.This is one of several places where Rastelli departs from received wisdom in the field. He likens the SLA process to learning to swim, ride a bicycle or ski: progress is initially slow, tentative and uneven, with many failures, not a gradual succession of gradient states, until suddenly, the child (or adult) can swim, ride or ski unaided. This, he claims, is because SLA is quantized. Learners need to encounter a statistically critical number of instances of a form or structure. Once that threshold is crossed, they are able to perceive regularities in the features they share andto conceptualize the motivation behind those regularities, in order to apply a rule over novel instances. The formation of grammatical categories is what triggers discontinuity — sudden quantum leaps from SL to GL.

Crucially, the new grammatical representations do not displace previously acquired statistical rule(s). Rather, the sudden shift to GL is marked by gemination: dual statistical and grammatical representation of an item or structure at two cognitive levels in underlying competence. The two learning processes, SL and GL, and the two mental representations for the same L2 phenomena, statistical rules and grammatical categories, continue to exist side by side.

The continued co-existence of SL and GL has at least two possible neurophysiological explanations. First, implicit and explicit knowledge of the same item coexist, remain independent, and can be accessed independently by speakers (Paradis 2009: 15). Second, although independent, declarative and procedural memory compete and cooperate with one another across a learner’s lifespan. Some parts of the temporal lobe serve as a repository for already proceduralised knowledge, while some areas of the prefrontal cortex are activated when knowledge stored in declarative memory is selected and retrieved. There is also evidence of a direct anatomical connection between the medial temporal lobe and the striatum, that is, the caudate nucleus and putamen in the basal ganglia (Poldrack and Packard 2003: 4), which, says Rastelli, is why Ullman and colleagues believe L2 acquirers can learn the same items by exploiting the resources of both declarative and procedural memory.

The use of ‘quantum’ and ‘quantized’ is deliberate. Rastelli notes that the idea of abrupt discontinuity in SLA parallels the trajectory identified for many phenomena in the natural sciences, and above all in quantum physics and quantum probability theory. A classic example is the finding in quantum physics that electrons do not change their orbit around a nucleus gradually along a continuous gradient-like energy scale with change in proportion to increased energy, but instead ‘jump’ from one energy level to another at the precise moment that the energy supplied is sufficient to reach the threshold required to trigger the change. In just the same way, SLA is quantized; there is no straightforward relationship between increased L2 exposure and L2 development.

So much for combinatorial grammar. Non-combinatorial grammar, in contrast, pertains to invisible features, such as null subjects, filler-gap dependencies, and island constraints on wh- extraction, and phenomena at the discourse-syntax and syntax-semantics interfaces. This means there is nothing overt in the input to combine, and frequency is therefore irrelevant. SL is no use here because learners cannot categorize over absences (empty categories or displaced items). Such items are computed and represented only mentally. Thus, non-combinatorial grammar cannot be acquired via SL. Rastelli predicts, for example, that adult learners of Italian will have more trouble with null subjects than with auxiliaries in compound tenses, not due to differences in their frequency, but because SL can support the procedure for concatenation of co-occurring items (auxiliaries and main verbs), but not for computation of absent items (missing subject pronouns). In the sentence Elena e arrivata (Elena is arrived), e + arrivata is a chunk that may consolidate in a learner’s memory over time and eventually constitute the basis for a productive rule for auxiliary selection. Conversely, the absent pronoun in Elena e arivata ma _ non ha parlato (Elena arrived but [she] did not talk) provides nothing the learner can remember and re-use in similar situations. SL allows the need for some form of the auxiliary verb ‘to be’ eventually to become predictable every time ‘arrived’ appears (and later, other verbs of movement), whereas the presence or absence of a subject pronoun cannot be predicted and must be computed each time. Gemination will occur in the former case, but not in the latter, when GL alone will be pressed into service. If the non-/combinatorial distinction turns out to be valid, Rastelli suggests, it is presumably one of the reasons missing features are problematic and often never acquired by some adult L2ers. Instead of SL, non-combinatorial grammar must be handled by GL, and the capacity for GL differs at the individual level and is more subject than SL to age-effects.

After a discussion of Rastelli’s position on age effects, Long moves to the differences between Rastelli’s hypothesis and other theories of SLA. Rastelli notes how ‘discontinuity’ differentiates his position from that of ‘continuity’ theories, such as Processability Theory, the norm in most SLA theorizing. As should be clear by now, he rejects the notion that L2 development is continuous, a series of incremental shifts (developmental stages) as a result of increased exposure to L2 input, without fractures or leaps:

The core idea of discontinuity is that the process of adult acquisition of L2 grammar is not uniform and incremental but differentiated and redundant. To learn a second language, adults apply two different procedures to the same linguistic materials: redundancy means that the same language items may happen to be learned twice.  (2014: 5)

The SL/GL distinction is qualitative (neurophysiological) in nature. It is not a matter of converting explicit to implicit knowledge (for Rastelli, implicit learning takes precedence, after all), so not a question of automatization of what started life as declarative knowledge, as in Skill Acquisition Theory, and not amenable, therefore, to the use of such measures as processing speed or reaction times. Discontinuity differs from restructuring in that the qualitative shift is not from non-productive to productive use of chunks via practice, but between two neurophysiologically distinct ways of learning that target two different parts of grammar. It shares ground with Ullman’s Declarative/Procedural Model (DPM), but as Rastelli shows through a detailed comparison, differs in important ways, with the discontinuity hypothesis, again, focusing on two kinds of learning processes, rather than two kinds of learning products, the lexicon and the grammar (differentiating between which is in any case far from straightforward), with some L2 grammatical items held to be learned statistically before grammatically. Rastelli also discusses the relevance of work in theoretical linguistics by Berwick, Yang, Roeper, O’Grady, Chomsky, Pinker, Grodzinsky, Hawkins, Tsimpli and others, the discontinuity hypothesis being shown to constitute a ‘semi-modular’ position in which categorical grammar relies on innate principles, while probabilistic grammars can be learned from positive evidence alone. Work of SLA scholars considered includes that of Bley-Vroman, N. Ellis, Wray, Sorace, Pienemann, Sharwood-Smith, Paradis, Slabakova, White, Ullman, Montrul, Robinson, Newport, and Williams.

Despite its broad scope and the obvious interest in similarities and differences between his own position and that of other theorists, Rastelli denies that Discontinuity offers a new theory of SLA:

Crucially, the word ‘theory’ is avoided purposely in this book . . . Basically, there cannot be a theory of discontinuity yet because the evidence provided so far can be interpreted in different ways . . . An expression such as ‘discontinuity hypothesis’ better conveys the image of the embryonic stage of a prospective theory of discontinuity. (2014: 6)

Nevertheless, the hypothesis he proposes is unquestionably innovative, and likely to motivate several new lines of empirical work. It will probably be regarded as (healthily) controversial in some quarters, but is without doubt an exceptionally interesting and intellectually refreshing contribution to the current SLA literature.



Abrahamsson, N. and K. Hyltenstam. 2009. ‘Age of onset and nativelikeness in a second language: Listener perception versus linguistic scrutiny,’Language Learning 59: 249-306.

Aslin, R. N. and E. L. Newport. 2012. ‘Statistical learning: From acquiring specific items to forming general rules,’Psychological Science 21/3: 170-76.

Aslin, R.N. and E. L. Newport. 2014. ‘Distributional language learning: Mechanisms and models of category formation,’ Language Learning 64/1: 86-105.

Berwick, R. C. 1997. ‘Syntax facitsaltum: Computation and the genotype and phenotype of language,’ Journal of Neurolinguistics 10/2-3: 231-49.

DeKeyser, R. M. 2000. ‘The robustness of critical period effects in second language acquisition,’ Studies in Second Language Acquisition 22/4: 499-533.

Ellis, N. C. 2002. ‘Frequency effects in language acquisition: A review with implications for theories of implicit and explicit language acquisition,’ Studies in Second Language Acquisition 24/1: 143-88.

Ellis, N. C. 2006. ‘Language acquisition as rational contingency learning,’ Applied Linguistics 27/1: 1-24.

Ellis, N. C. 2009. ‘Optimizing the input: Frequency and sampling in usage-based and form-focused learning’ in M. H. Long, andC. J. Doughty (eds): The Handbook of Language Teaching. Blackwell, pp. 139-58.

Ellis, N. C. and S. Wulff. 2015. ‘Usage-based approaches to SLA’ in B. VanPatten, and J. Williams (eds.):Theories in Second Language Acquisition. An introduction. 2nd edition. Lawrence Erlbaum, pp. 75-93.

Hilles, S. 1986. ‘Interlanguage and the pro-drop parameter,’ Second Language Research 2/1: 33-51.

Granena, G. and M. H. Long.2013. ‘Age of onset, length of residence, language aptitude, and ultimate L2 attainment in three linguistic domains,’Second Language Research 29/3: 311-43.

Hamrick, P. 2014. ‘A role for chunk formation in statistical learning of second language syntax,’Language Learning 64/2: 247-78.

Janacsek, K., J. Fiser, and D. Nemeth. 2012. ‘The best time to acquire new skills: age-related differences in implicit sequence learning across the human lifespan,’Developmental Science 15/4: 496-505.

Munnich, E. and B. Landau. 2010. ‘Developmental decline in the acquisition of spatial language,’Language Learning and Development 6/1: 32-59.

Nemeth, D., K. Janacsek, and J. Fiser. 2013. ‘Age-dependent and coordinated shift in performance between implicit and explicit skill learning,’Frontiers in Computational Neuroscience 7/147: 1-13.

Osterhout, L., A. Poliakov, K. Inoue, J. McLaughlin, G. Valentine, L. Pitkanen, C. Frenck-Mestre, and J. Hirschensohn. 2008. ‘Second-language learning and changes in the brain,’ Journal of Neurolinguistics 21: 509-21.

Paradis, M. 2009. Declarative and procedural determinants of second languages. John Benjamins.

Poldrack, R. A. andM. G. Packard. 2003. ‘Conpetition among multipl memory systems: Converging evidence from animal and human brain studies,’ Neuropsychologia1497: 1-7.

Rebuschat, P. (ed). 2015. Implicit and Explicit Learning of Languages. John Benjamins.

Rebuschat, P. and J. N. Williams (eds).­ 2012. Statistical Learning and Language Acquisition. Walter de Gruyter.

Robinson, P. andN. C. Ellis (eds). 2008. Handbook of Cognitive Linguistics and Second Language Acquisition.Routledge.

Saffran, J. R. 2003. ‘Statistical language learning: Mechanisms and constraints,’Current Directions in Psychological Science 12: 110-14.

Saffran, J. R., E. L. Newport, and R. N. Aslin. 1996. ‘Word segmentation: The role of distributional cues,’Journal of Memory and Language 35: 606-21.

Spadaro, K. 2013. ‘Maturational constraints on lexical acquisition in a second language’ in G. Granena, and M. H. Long (eds): Sensitive Periods, Language Aptitudes, and Ultimate L2 Attainment. John Benjamins, pp. 43-68

Tanner, D., K. Inoue, and L. Osterhout. 2014. ‘Brain-based individual differences in on-line L2 grammatical comprehension,’ Bilingualism: Language and Cognition 17: 277-93.

Williams, J. N. 2009. ‘Implicit learning’ inW. C.Ritchie, and T. K. Bhatia (eds):The New Handbook of Second Language Acquisition. Emerald Group Publishing, pp. 319-53.



6 thoughts on “Rastelli’s Discontinuity Hypothesis: a new challenge for SLA researchers

  1. Hi Patrick,

    Yes, I think you’re right to worry. It needs careful handling and is likely to get less than that from the usual suspects 🙂 Rastelli’s book deals with the matter very carefully, and an important part of his case rests on work in neurolinguistics, covered in Long’s review but largely omitted from my summary.

    Rastelli says that the move beyond statistically based patterns to the induction of productive combinatorial rules is a development reflected in the much-discussed electrophysiological shift from N400 to P600 ERP components (see, e.g., Osterhout et al 2008). The electrophysiological shift doesn’t indicate a transition from a lower developmental stage to a higher one, but rather from N400 to P600. As Long comments, the revision of the biphasic N-400-P600 model proposed by Tanner (e.g., Tanner, Inoue, &Osterhout 2014) could be in line with the idea of discontinuity that Rastelli proposes: L2 learners,like native speakers,become able to shift in real time between a lexical and a grammatical strategy with the same L2 structure.




    • Prompted by an email from Kevin Gregg, I should say that I wrongly suggested that Processability Theory is the regnant trend in SLA research. It isn’t. I think I must have been distracted by Long’s comment in his review. Anyway, what I meant to say is that Rastelli’s work challenges the view held by many SLA researchers that SLA is a process involving the continuous development of interlanguages.


    • Thanks Geoff. This is fascinating. I know I’m being lazy and I should just read the book, but does this mean that when you say that, according to Rastelli, SLA is quantized ‘in just the same way’ as electrons ”jump’ from one energy level to another at the precise moment that the energy supplied is sufficient to reach the threshold required to trigger the change’ you really do mean that this happens in ‘just the same way’ and that Rastelli is not being metaphorical? If so that is genuinely stunning. I really must read it.

      The interlanguages construct still remains useful for practical purposes, though, doesn’t it? Just as classical physics still suffices for most practical purposes.

      Thanks again for being so consistently thought-provoking.


      • Hi Patrick,

        Rastelli’s hypothesis claims what you say, so no, it’s not a metaphor at all. But it is a hypothesis, it needs lots of testing, and it needs development. Another bit I left out of Long’s review is where he explains his quite serious reservations about Rastelli’s treatment of the age factor. I urge you to read Long’s review if you have access to it, and also to read Rastelli’s book, available here: https://www.amazon.co.uk/s/ref=nb_sb_noss?url=search-alias%3Daps&field-keywords=Discontinuity+in+Second+Language+Acquisition&rh=i%3Aaps%2Ck%3ADiscontinuity+in+Second+Language+Acquisition

        And of course the interlanguage construct is quite capable of surviving Rastelli’s challenge; what’s interesting is the challenge to UG, modular mind, procedural and declarative knowledge and our understanding of the relationship between implicit and explicit knowledge. As I indicated, it’s also an interesting contribution to the increasingly reported “emergentist” views, which till now have either been very badly argued (e.g., Larsen Freeman; Thornbury) or seriously challenged (e.g., Gregg’s review of Nick Ellis).


  2. Pingback: Guidelines for educationally responsible grammar-translation – Mark: My words

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s