[Journal of the Simplified Spelling Society, 1994-1 pp29-33 later designated J16]
Err Analysis: som reflections on aims, methods, limitations and importnce, with a furthr demnstration. Part 1.

Christopher Upward.

This articl in ritn in Cut Spelng (CS).

0. Introduction.

A numbr of factrs hav motivated this furthr exrcise in anlyzng english spelng errs. One factr is a degree of disatisfaction with som previus atemts. Anothr is th desire to set out som of th limitations and complications that such analysis entails. A third is th hope that fresh evidnce may emerj on th dificltis of english spelng, wich may be useful to spelng reformrs. And a fourth is th wish to explor som of th implications of Valerie Yules ke precept that spelng desyn needs to reflect human needs and abilitis, rathr than necesrly som a priori linguistic principl like one-to-one sound-symbl corespondnce.

1. Som previus analyses.

1.1. Wing & Baddeley (1980). Altho varius exrcises in mispelng analysis had been publishd in previus decades, Wing & Baddeley [1] had th distinction of probbly being th first to include a substantial err-corpus as an appendix to ther analysis (tho they aknolej Bawden, 1900, as a minor precursr). This means that othr reserchrs can reanlyz and use ther orijnl data wich, it wil be sujestd in this articl, represents th lastng valu of ther work. Certnly they presentd ther findngs in th expectation that subsequent analyses wud improve on ther methodolojy. If it seems worth hylytng ther main shortcomngs now, it is partly because ther work is so ofn cited uncriticly in th litratur, and partly in ordr to demnstrate th need for a clear defnition of aims and a clear vew of th overal context of such analyses.

Wing & Baddeley aproach ther task as experimentl sycolojists, but like al too many reserchrs from that bakground, they pay litl atention to th linguistic dimension of ther material (ther sole refrnce to it, on pp261-62, is th pasng remark that som errs may arise from "difficulties associated with rules for adding suffixes".) Ther study, like many othrs, seems to be based on th asumtion that by anlyzng mispelngs in english it is posbl to arive at conclusions about th sycolojy of litracy in jenrl. It is importnt to emfasize th falacy of this asumtion: mispelngs in english chiefly sho th human mind struglng with a uniqely eratic riting systm, and for that very reasn they canot be used as evidnce for th sycolojy of norml alfabetic litracy. To do so is rathr like basing a sycolojy of mathmatics on how peple wud do arithmeticl calculations using roman numerals.

It is a symtm of ther neglect of th linguistic dimension that Wing & Baddeley divide ther corpus of errs into two categris, wich they cal Typ 1 (consistng of 847 'slips') and Typ 2 (consistng of 229 'convention errs'), and concentrate ther analysis entirely on Typ 1. But if th study has lastng valu, it is surely to be found in th listng of Typ 2 errs, wich ar a classic compendium of th dificltis with wich th english riting systm confronts even hyly educated users. Useful tho it wud be, we canot here anlyz them in ful; but we may at least note that 17 of th first 20 Typ 2 errs listd relate directly or indirectly to th CS redundncy categris (irelevnt letrs, shwa, dubld consnnts), and that two othrs involv confusion over th letr C.

It is th Typ 1 errs that intrest Wing & Baddeley, and they categrize them in terms of 4 mecnistic, non-explanatry criteria, acordng to wethr they involv omission, adition, substitution or inversion of letrs. They confess that it is not always esy to distinguish Typ 1 from Typ 2 errs. Wen one examns th Typ 1 errs, it is imediatly aparent that ther is a lot mor to many of them than can be explaind away as mere 'slips'. Th very first err ilustrates th problm: th riter wantd to spel intellect, but began with th letrs intele...; howevr, because th word was then respelt corectly, th err was classifyd as a Typ 1 'slip' and not as a Typ 2 'convention err'. Yet it is clear that th err arose over precisely that featur of th form intellect wich is hardst to spel from nolej of th pronunciation. In othr words, th err was not, as Wing & Baddeleys discussion implys, a randm slip in th cognitiv procesng of a particulr string of letrs that myt equaly hav ocurd elsewher in th same word, or in a difrnt word, or in a difrnt languaj. On th contry, th riter stumbld (tho without finaly falng, in this instnce) over that classic dificlty of english spelng: th unpredictbility of consnnt dublng.

On chekng th ful Typ 1 list, we find that of th 847 so-cald 'slips', as many as 341 (40%) ar atributebl to that same cause, ie linguistic dificltis. If we ad this figur to th 229 Typ 2 errs, we get 570 'convention errs'; and if we deduct it from th Typ 1 total of 847, we end up with only 506 'slips'. Thus 53%, rathr than th orijnl 21%, of th total corpus cud mor apropriatly be clasd as 'convention errs'. This finding itself implys that th chief purpos of mispelng analysis in english shud be to identify th dificltis of th systm, rathr than particulr patrns of cognitiv procesng.

Of th 506 remainng 'slips', it was noticebl that many wer of th typ an for and or the for they; and that many mor cud be atributed to a hypercriticl interpretation by th scrutineer of th riters handriting (to list reccgnise for exampl as a mispelng of recognise seems absurdly harsh, wen th falt cud lie with an intruption in th flo of ink to th riters pen!). Th presnt authr has over th years incresingly inclined to th vew that 'ther is no such thing as a spelng slip' (ie al mispelngs ar somhow or othr linguisticly motivated), and, watevr exeptions may be found, he feels this vew is to a significnt extent confirmd by th Typ 1 listng. Indeed th question inevitbly arises wethr th Typ 1 corpus is substantial enuf to sustain th kind of analysis Wing & Baddeley subject it to at al.

It furthr emerjs that th authrs cognitiv findngs ar exeedngly tentativ and tenuus anyway, and ar partly undrmined by ther own methodolojy. Ther initial hypothesis is that errs ocur mor toward th end of words than erlir on, because riting involvs transferng th imaj of th letr sequence of each word into a memry 'bufr', but as th letrs ar successivly ritn down, th imaj decays rapidly. Thus th recal of letrs that ocur late in th spelng of a word is weaknd, leving them especialy prone to err. Howevr, wen countng errs, th authrs only include th first err in any word, wich has th autmatic consequence of elimnating som errs found toward th end of words. Th authrs wer not surprisingly disapointd that th tendncy to late errs was not very markd, and they respondd by preferng an alternativ hypothesis: that th midl of words is mor prone to err because of 'intrference' between ajacent letrs. A linguistic aproach by contrast wud point out that th ends of english words ar ofn caractrized by certn kinds of fonografic ambiguity, and that errs in that position ar th natrl consequence.

That linguistic factrs, to do with th unpredictbility of sound-symbl corespondnces in english, myt be overwelmngly mor powrful than any such cognitiv processes in determnng err-ocurences, was not considrd. This oversyt is al th mor stranje because th authrs seem to accept in ther introduction (p252) that "writing depends heavily on the word-to-phoneme conversion process"; but ther primary concern, as they then state, was "the involvement of short-term memory in handwriting".

Anothr limitation on th validity of ther findngs, wich they do not aknolej as such, is th fact that al ther 40 riters wer aplyng for places to study sience at Cambrij University; in othr words, they constituted a hyly selectd educationl élite of yung, predomnntly male adults. Elsewher th authrs remark that "error rates in normal people are very low", but they leve unclear wethr they regard ther riters as 'norml peple'.

In short, not merely did th Wing & Baddeley analysis entail inherent methodolojicl defects, but they took no acount of linguistic and socio-educationl factrs wich necesrly hav a fundmentl impact on th significnce of ther data.

It shud be add that th book in wich th Wing & Baddeley study apeard also contains th foloing chaptrs wich impinj on th area of mispelng analysis: Gillian Cohen 'Reading and Searching for Spelling Errors' (pp135-157); Norman Hotopf 'Slips of the Pen' (pp287-307); Hazel E Nelson 'Analysis of Spelling Errors in Normal and Dyslexic Children' (pp475-493). Because these hav not acheved th same reputation in th litratur, they ar not considrd in detail here. Sufice it to say that Cohen is concernd with spotng errs, not with ther causes; Hotopf says his purpos "is to compare slips of the pen with those of the tongue"; and Nelson is intrestd in th diagnostic aplications of mispelng analysis for dyslexics. In this paper, by contrast, we ar primarily intrestd in wat mispelngs tel us about th riting systm rathr than about th riter.

1.2. Th Journal of the Simplified Spelling Society (1987/3) publishd a thre-part analysis entitled 'Can Cut Spelng Cut Mispelng?' [2]

Th thre parts related to
1) a smal corpus of som 50 undrgraduat mispelngs,
2) 444 mispelngs found in 9-year-old Daisy Ashfords late 19th century story The Young Visiters, and
3) 1,377 errs found in riting by 163 15-year-olds.

Th corpus for th presnt study (se belo) paralels that used for that third part. Th purpos of th 1987 study was specificly to establish how far th errs found myt hav been preventd if th riters had used CS. Th report did not atemt to adress wider issus, and neglectd to colect data that myt hav been of wider intrest. It did howevr refer to som othr studis, such as mispelngs made in ritn english by non-nativ speakrs in Uganda [3] and Singapor, [4] and to Roger Mitton's corpra lojd with th Oxford Text Archive [5].

1.3. Th National Foundation for Educational Research (NFER, 1993) [6] analysis was recently revewd in th Journal of the Simplified Spelling Society. [7] Th revew pointd out that altho th data and ther close analysis wer sound and valubl, som importnt overal statisticl conclusions regardng jenrl standrds of spelng acuracy wer less soundly based. In particulr, th text sampls used for th corpus wer standrdized by th numbr of handritn lines - 10 - and not by th numbr of words ritn. This not merely ment that a riter with smal handriting wud be rated as less acurat than an equaly good riter with larj handriting, but it ment that no abslute mesur of acuracy was posbl in terms of th proportion of words corectly and incorectly spelt. Th presnt study, tho its corpus is only about one tenth of th size of th NFER corpus, is desynd to avoid those falts.

Altho th NFER employd 4 non-explanatry categris of mispelng like Wing & Baddeley (calng them insertion, omission, substitution, transposition, insted of adition, omission, substitution, inversion), it also used wat it cald 'minor error categories' (homofones, real words, effects of pronunciation, doubled letters, silent letters, 'magic' e, schwa vowels, transposition of i and e). These hav th importnt potential to explain wy errs ocurd, tho th NFER did not exploit them for that purpos.

2. Th Presnt Study: jenrl findngs.

Th presnt study represents a smalr-scale but methodlojicly mor rigrus replication of th third analysis in th abov-mentiond 1987 report. Th corpus in both cases was derived from ansrs to questionairs containng 10 unfinishd sentnces wich wer completed by th respondnts. Th material was kindly made availbl for mispelng analysis by Cyril Simmons of Loughborough University, ho desynd th questionair and subsequently aplyd it (variusly translated into french, jermn, arabic, japnese) in a comparativ intrnationl study of yung peples atitudes. [8] Th 10 unfinishd sentnces wer as folos:

1) The sort of person I would most like to be like...,
2) The sort of person I would least like to be like...,
3) The people I am happiest with are...,
4) The people I am unhappiest with are...,
5) When I am by myself I...,
6) What matters to me more than anything else...,
7) The best thing that could happen to me...,
8) The worst thing that could happen to me...,
9) The best thing about life is...,
10) The worst thing about life is....

Th questionairs wer completed anonmusly and th respondnts wer asured that ther replys wud remain confidential and constituted no kind of test. Th respondnts thus did not no that th quality of ther riting was to be examnd in any way, and wer therfor undr no pressur to rite lejbly, gramaticly, or coherently. Th subject matr concernd th students persnl feelngs, ther relations with famly, frends and othrs, ther intrests, and ther hopes and fears. Th vocablry they used therfor typicly covrd a very limitd ranje of discorse, was spontaneusly chosen and ofn coloquial, and hevily repetitius. These conditions may seem ideal for elicitng th students most 'natrl' spelng; but th results may also sho a loer levl of acuracy than th students cud hav produced in mor forml conditions. Furthrmor, if mispelng analysis is to serv as a jenrl tool for th desyn of spelng reform, it wud need to covr much wider areas of discorse, including th languaj of al th main scool subjects, and thus also covr th spelng of sientific and tecnolojicl termnolojy.

Th 1987 analysis drew on 163 questionairs, completed in 1981, by mainly 15-year-olds at a larj-city comprehensiv scool in th english East Midlands rejon. Th presnt analysis, carrid out in 1994, used identicl questionairs completed 10 years later, in 1991, by 73 mainly 15-year-old students (6 had not quite turnd 15, and 1 was 16) at a smal-town comprehensiv scool in th same rejon. In both cases, th questionairs wer completed by a ful year-group, covrng th hole ability ranje representd at th scools in question.

A total of 1,377 errs wer classifyd in th 1987 study, but in th presnt study only 357 wer identifyd. Thus each respondnt in th erlir study avrajd over 8 riting errs, wile in th presnt study th mean was just undr 5. No reasns for th incresed acuracy wer aparent, but factrs may include any or al of th foloing: educationly mor advantajd home bakgrounds; a mor favorabl scool environmnt; superir jenrl educationl experience from improved curicula or betr teachng; gretr emfasis givn to acurat spelng during scoolng; betr visul memry for spelngs; fewr words ritn in th respondnts ansrs. Th betr 1991 scors canot of corse be taken to imply that standrds of teenajers spelng rose jenrly during th previus decade. Th relativ scors of th 1981 and 1991 riters ar howevr only incidentl to this study: it is th natur of th errs, rathr than ther total numbr, that is of prime concern.

Nevrthless certn statistics concernng overal acuracy ar worth noting. Male respondnts in th secnd study outnumbrd females by 43 to 30, but since th female respondnts rote a mean of 157 words compared with only 89 ritn by th males, significntly mor words ritn by females wer scand for errs than by males (4700 compared with 3828). Th male respondnts made 169 errs altogethr, and th females 188; but wen related to th numbr of words ritn, this shos a rathr hyr levl of acuracy in females: th males made one err per 23 words ritn, wheras th females made only one per 25 words ritn. (Th NFER study found a much mor markd superiority of female riters per 10 lines of riting, but overlookd th posbility of larjr female handriting afectng th result.) In th presnt study, repeatd errs wer countd each time, and words mispelt in mor than one respect likewise countd for mor than one err (eg sosity for society countd as 2 errs).

Not evry err in th presnt study representd a 'mispelng' in th strict sense. Th total included a handful involvng othr ritng errs, such as rong word choice, and ther wer 140 orthografic errs wich did not involv th rong aplication of letrs as such (these may be cald 'metaorthografic' errs). Mispelngs in th sense of misused (substituted, insertd, misplaced, or omitd) letrs totald 208. Thre categris of metaorthografic err wer noted. Th larjst numbr (53) concernd capitlization, most using capitl letrs inapropriatly, but a few failng to use them wen required (eg european). Anothr categry of metaorthograficl errs involvd unconventionl word divisions (44 errs); many of these wer singl words of th typ someone, everywhere ritn divided as some one, every where, but th noun frase a lot was ritn 21 times as a singl word (alot). Almost as many errs (43) involvd use of th apostrofe, with thre rufly equal categris:

1) omission (peoples, its for people's, it's),
2) with non-posessiv inflections (happen's, injustis's),
3) in -n't contractions, with th apostrofe eithr omitd (arent, wouldnt), or placed befor th N (are'nt, would'nt).

Mispelngs involvng letrs also fel into thre categris. Most numerus wer mispelngs of vowl sounds (110), folod by mispelngs of consnnt sounds (88); mispelngs of silent letrs wer less comn (15).

3. Mispelt vowls.

About 60 of th 110 mispelt vowl sounds involvd long vowls and/or two vowl letrs, with mostly th rong pair of letrs chosen, or th pair ritn in th rong ordr, but somtimes with one letr ritn for two or vice versa. Mispelngs of th unstresd 'obscure' vowl shwa acountd for over 30 vowl errs.

3.1. Long vowl and two-letr mispelngs can be categrized by sound and spelng patrn as folos:

/ei/ in raisist (=racist), waist (=waste), the (=they), and simlrly with foloing R in billionare, there (4 =their, 1 =they're), unfare.

/i:/ in acheive (2), corea (=career), fellings (=feelings), meat (=meet), peice, resonable, wierd, simlrly /i/ in unstresd, mostly final sylabls, as in babys, bitchey, constantley, enemys, happyness, humanites, marride, showey, stupied, worring (=worrying).

/ai/ in Brain (=Brian), buy (2, =by), deiying/ dieying/dieing (2, =dying), kaliedoscope, liabary (=library), me (2, =my), paralized,sosity (=society).

Othr notebl confusions ocurd as in addition (=audition), afull (=awful), aloud (=allowed), babon (=baboon), crewl (=cruel), dosen't (=doesn't), inturperet (=interpret), lonley (=lonely), meny, thoght, wepans, and repeatdly in freind (15) and frend (3), compared with 83 ocurences of th corect form friend. Al these vowl errs wer in varying degrees atributebl to th lak of straitforwrd sound-symbl corespondnces in english. Very few vowl errs apeard unmotivated; but such wer en (=in) and personlity, wile luv myt be explaind or excused as a wilful coloquialism.

3.2. Mispelt or omitd shwa ocurd most ofn in post-accentul position, thus in final unstresd sylabls in acter, closists (=closest), consios (=conscious), favourate, independant, intelligant, listern (2, =listen), politicion, Sharan, sponcerd (=sponsored), wepans (=weapons), and in medial unstresd sylabls in alcaholic/alcholic, crimenals, diffrent, famly, intelegent/ intellegent, intrested/ intrests, jewellry (contrast americn jewelry, british jewellery), knowladgeable, misrable, orphaniges, prejidice, proberly (=probably). Th mispelling catorgery for category shows this err twice. Shwa cud ocasionly also be mispelt in initial sylabls, as in corea (=career), Farari (=Ferrari), sucure (=secure). Th virtul silence of a vestijl shwa in forms like TO different, interest may then also sujest to riters that simlr vestijl shwas lurk unsuspectd in othr comprbl environmnts, as between a consnnt and R; this wud explain th intrusiv e in th form inturperet.

4. Mispelt consnnts.

By far th gretst dificlty experienced by riters with consnnts concerns wethr they shud be ritn dubl or singl. Th presnt corpus containd th foloing instnces of faild dublng: F in of (=off), G in drugie (=druggie), L in academicaly, aloud (=allowed), intelegent, equaly, polution, realy, tele (=telly), M in imature, N in anoy, billionare, questionaire, P in apreciate, droping, R in aray, embarassing, Farari, tommrow (=tomorrow), S in posible, proffesional. Conversly, false dublng was seen as with: D in saddness, F in off (=of), proffession/ proffessional/ proffesional/ proffetional, L in allone, allready, allways, helpfull, M in tommorrow, tommrow, P in appart.

Also quite widespred wer errs asociated with th overlapng uses of th letrs Q, K, C, S, T, X, Z. Th foloing instnces wer found: check (norml americn for british 'cheque'), thik, sponcerd (=sponsored), consios (=conscious), critisise (2), injustis's, sosity (=society), practice (acceptbl americn for british 'practise'), sucess/ sucessful/ succsessful, raisist/rasiste (=racist), proffetional, sexsist. Probbly asociated with this jenrl area of confusion, tho strictly speakng unmotivated vowl errs, wer th forms Leicster (=Leicester), muscian (=musician). Th repeatd ocurence of th abreviation ect (=etc) may also be seen in th same context, riters being unclear wethr th abreviation shud retain th capitlized letrs in EtCeTera or in ETCetera?

Most othr consnnt errs apeard th result of poor articulation or inadequat fonemic/gramaticl analysis, as in th forms ashma (=asthma), brillant, decen (=+t), involve (=+d), tamp (=tramp), understand (=+s), vanblue (=vandal). Simplification of consnnt strings, as in decen for decent, is a comn featur of non-nativ riting (and speakng) wen th riters mothr tong dos not use such strings, and ther ocurence in th presnt corpus may be a syn of non-nativ-speakng bakground. Alternativly, it may reflect orthografic imaturity, as such simplifications also caractrize th spelng of th yungst riters. In a few instnces, th orijn of this group of errs lay clearly or probbly with th vagaris of english spelng, as in coulndn't, talbe (=table), were (=where).

5. Silent letrs.

Silent letrs enjoy particulr notoriety in ritn english, and sure enuf they produced a modest crop of errs in this corpus. Predomnnt among them was silent E, wich was somtimes omitd, as in aloud (=allowed), sponcerd (=sponsored), unfortunatly, your (=you're), and somtimes insertd, as in behinde, moveing, pouche, rasiste (=racist), whose (=who's). Th word else was twice mispelt with a medial E (eles elese), in a manr stranjely remnisnt of its Midl English form elles. An isolated case, but striking in its own ryt, was th form nowing (=knowing). For non-rotic speakrs (ie english speakrs ho only pronounce R befor a vowl), as our questionair respondnts wil like th majority of th english mostly hav been, th letr R is a constnt sorce of uncertnty; thus we se it omitd in corea (=career), but insertd in proberly (=probably).


