[Journal of the Simplified Spelling Society, 1988/1 p25-27 later designated J7]
[See Journal and Newsletter articles, Pamflet 15 and Cut Spelling by Chris Upward.]

Can Cut Speling Cut Misprints?

Christopher Upward

Th Cut Speling used in this articl removes redundnt letrs from t.o. as folos: 1. It cuts letrs irelevnt to pronunciation (det), 2. It uses sylabic < l, m, n, r> in most post-accentul short sylabls (metl, atm, prisn, detr,), it regulrizes inflections as <d, s> (hintd, bushs), 3. It simplifys most doubld consonnts (eg, bigr, agravate). Wher apropriat it also replaces <gh, ph> by <f> (tuf, filosofy), <g> by <j> (e.g. jinjr, juj), and <ig> by <y> (e.g. syn, hy). A refinement used here for th first time is that <u> is only dropd aftr <q> wen silent: e.g. mosqito, tecniqe but question, quite. Readrs ar invited to coment on th spelings used.

The advice of Wendy Berliner, editor of the AMMA jurnl report, and Anne McHardy, news editor with The Guardian are gratefuly acknolejd in the composition of this articl, but they ar of cors not responsibl for any statemnts here made.


A study of patrns of misprint in th press sujests that Cut Speling cud reduce ther ocurence. Wile a reduction in mispelings wud require th orthografy to be regulrised, a reduction in misprints wud be th product of statisticl and sycolojicl factrs. In th corpus examnd for this study, most misprints involv singl letrs, and ar most likely to ocur in th least obtrusiv position, i.e. in th secnd haf of longr words. A reform like Cut Speling, wich shortns words, therby makes misprints mor obtrusive and, it is argud, mor likely to be noticed and corectd. Th study is limitd in scope, but if its findings ar valid, th implications need to be taken into acount wen considring reform-stratejy.


1.1 Difrnt causes of mispelings/misprints.
Th Journal of the Simplified Spelling Society 1987/3 (pp. 21-24) containd an analysis of mispelings, and demonstrated that many of them ar in som way conectd with th redundnt letrs found in t.o., a fact that is of considrbl significnce in determning prioritis for a first staje speling reform. Misprints on th othr hand do not imediatly sujest any such obvius lesns for speling reform stratejy, since ther causes ar usuly quite difrnt. Thus handritn mispelings ar ofn made by less skild riters, they typicly arise from failur to mastr an iregulr speling systm, and they can be reduced if th systm is made esir to mastr; but printd and typd text is normly produced by skild riters, most of hose mistakes wil probbly result from lak of care rathr than ignorance. Furthrmor such text is mor likely to be proof-red befor being relesed. Howevr newspapers in particulr ar publishd undr pressur and ar wel-nown for ther frequent misprints, and th question is at least worth asking wethr these exibit specific patrns, wich myt lend themselvs to improvemnt by a speling reform of th Cut Speling typ.

1.2 Text selection.
In compiling a corpus of misprints for this analysis, th press was used as th sole sorce, on th asumtion that it wud provide th gretst density of exampls. Down-market papers wer not selectd because they contain less text (tabloid format) and a mor limitd lexicl ranje. Among 'quality' britsh newspapers The Guardian has a reputation (from wich it derives a certn wry satisfaction rathr than a sense of shame) for its many misprints - hence its nikname, The Grauniad, wich, so an apocryfl story has it, actuly apeard once on th masthed; and it was chosen as th main sorce. As a control som comprbl texts wer scand from The Independent, The Daily Telegraph and The Times. As it hapnd, th period from wich th copis of The Guardian wer chosen for investigation was unusuly fruitful for misprints, not because of that reputation for typograficl inacuracy, but because by chance The Guardian was then in th thros of transition from th traditionl linotyp tecnolojy to th new tecniqes of direct inputing: text was as a result apearing in print with far less careful cheking than norml, or ocasionly even completely unchekd. An aditionl fortuitus factr that probbly contributed to th hy density of misprints found was that th copis used wer first editions, wheras many misprints ar weedd out in later editions. By 1988 howevr The Guardians new tecnolojy is instald, new working practices hav evolvd, repeatd cheking (normly by at least 6 pairs of ys - th inputr, newseditr, subeditrs, proofreadrs, etc) is again th ordr of th day, and far fewr misprints ar expectd to ocur than during th dificlt period of transition. In fact fewr misprints ar now likely even than befor th introduction of new tecnolojy, because now that text is corectbl imediatly onscreen, th confusing and messy busness of pencild overiting is a thing of th past: th copy is now clean and fuly lejbl; furthrmor, altho in th old days th linotyp oprator cud remove furthr errs that had not previusly been spotd, new errs myt also creep in at. that staje. Th data used in this analysis shud therfor emfaticly not be quoted as evidnce for th typograficl incompetnce of The Guardian; but th larj numbr of errs found undoutdly made th task of analysis much esir.

1.3 Serch-methods and overal findings.
Al th news-items (i.e. al continuum prose text, but not advertismnts, tables, listings, etc) in a singl issu of The Guardian (tuesday 4 august 1987) wer carefuly scand once only, and som 160 misprints found, scatrd very unevenly over 21 pajes of th 28-paje newspaper. Th front paje, with 19 misprints, and p. 8 (foren news) with 18 misprints containd many mor than any other paje; th next worst wer 3 pajes with 11, and 3 pajes with 10 misprints each. To obtain som idea of th proportion of misprints discovrd in a singl reading, pages 1 and 8 wer reread, and a total of 5 furthr misprints found (i.e. about anothr 14%). Th pajes on wich th larjst numbr of misprints ocurd wer for most part those wich ar set undr th gretst pressur and with least oportunity for cheking. That th front and foren news pajes ar particulrly prone to misprints was confirmd by th finding that th two equivlnt pajes in th issus of thursday 6 and satrday 8 august also containd many misprints, 10 & 24, and 12 & 10 respectivly, and that th weekly Education Guardian suplmnt, for wich over 24 ours cheking time is availbl, containd fewst misprints (3 over 2 pajes). As a control of th frequency of misprints found in The Guardian, th front paje and th foren news paje of The Independent (wensday 5 august), The Daily Telegraph (friday 7 august) and The Times (satrday 8 august) wer also serchd, and wer found to contain 8, 16 and 12 errs respectivly. Thus these thre jenrly comprbl newspapers wer found to contain a total of 36 misprints as against a total of 93 for th equivlnt pajes in thre issus of th The Guardian. Howevr, as we hav seen ther wer special reasns for The Guardians bad performnce.


2.1 Typs of misprint & ther proportions.
In th erlir analysis of mispelings it was necesry first to define wich non-standrd forms shud actuly be countd as mispelings, and in particulr wethr only mistakes involving letrs of th alfabet constituted mispelings in th strict sense. In th presnt analysis of misprints an even wider ranje of unintendd forms was observd than with mispelings. Hardly any of th misprints cud howevr with certnty be diagnosed as mispelings in th sense that th riter probbly did not no how to spel a word 'corectly'; th two most likely mispelings of this kind found in th Guardian corpus wer both hetrografs that had been confused: flare ritn for flair, and discrete for discreet. Nearly al th othr misprints wer redily atributebl to haste, carelessness and inadequat cheking. They broke down as folos:
i. over two-thirds involvd a singl letr in a word being omitd, insertd, reversed or substituted
ii. about 10% involvd th presnce, misplacemnt or absnce of a hyfn.
iii. 7% involvd an absnt, superfluus or rong word.
iv. 6% involvd th misuse of upr or loer case letrs.
v. 5% involvd an apostrofe.
vi. 4% involvd successiv words apearing with no intrvening space.

2.2 Linguistic frequency v. obtrusivness.
How is this variation in frequency of th difrnt typs of misprint to be interpretd? We may hypothesise two factrs, wich we wil cal linguistic-typograficl frequency and obtrusivness respectivly. At th most superficial levl, it is evidnt that th frequency of a givn err-typ bears som relation to th frequency of th linguistic-typografic form in wich it ocurs: hole-word errs (7%) for instnce ar far less comn than singl-letr errs (over 213), for one thing simply because ther ar many times fewr hole words than singl letrs, and th mere oportunity for hole-word errs to ocur is corespondingly less. We may hypothesise that th obtrusivness factr wud oprate as folos: th frequency of givn err-typs shud corespond inversely to visul obtrusivness. In othr words, th mor imediatly obvius an err is, th less frequently it shud tend to ocur, because th most obvius errs ar most likely to be noticed, and corectd. No dout sycolojicl experimnt can provide data on this point.

2.3 Varying effect of th two factrs.
Th two factrs may eithr reinforce or work against each othr. Thus th overwelming prepondrnce of singl-letr errs wud be th product of both factrs working togethr: letrs of th alfabet ar by far th most comnly ocuring caractrs in english prose, and individuly they ar relativly unobtrusive On th othr hand, altho words ar only a few times less comn than letrs, and far mor comn than punctuation marks or othr non-alfabefic symbls, ther relativly hy freqency is larjly outweid by th obtrusivness of errs involving hole words, wich only acount for 7% of al th misprints (but here th obtrusivness factr is working both ways: an extra word is contextuly obtrusive but its gestalt dos not in itself jar on th readr). Th least comn categry of err, th joining togethr of seprat words (4%) shos th absolute dominnce of th obtrusivness factr: th typografic form involvd here is th space (wich is omitd wen seprat words ar joind), and it has hy frequency, ocuring aftr evry word exept th last in any text; but if two seprat words ar joind togethr th misprint is very obvius because th gestalt is at first glance not usuly recognisebl as a word at al; so in th foloing exampls from th corpus, ofthe, thisludicrous, washowever, inthe, th joind forms at once jar on th readr with ther stranje apearance. Despite th hy levl of oportunity for them to ocur, such word-joining errs in fact ocur less frequently than errs involving th apostrofe, altho th linguistic-typograficl frequency of th latr is far less; but clearly, in th case of th apostrofe, th obtrusivness factr is lo, since in fluent reading we se them scatrd (at first glance, seemingly almost randmly) around th text without ther gretly afecting th familir gestalt of words; indeed that may be one reasn wy lernrs find it so hard to mastr ther corect use. Confusion of upr and loer case letrs (6%) is, as one wud expect, infrequent on both counts: initial letrs ar th most obtrusiv in words, and upr case letrs ar used chiefly to start sentnces and propr nouns, and so ar relativly uncomn.


3.1 Relativ obtrusivness of err-typs.

Singl-letr misprints constituted over 2/3 of th total, th 113 cases subdividing as folos:
i. 48 involvd omission of a letr, as notfication.
ii. 31 involvd insertion of a letr, as continiue.
iii. 25 involvd substitution of a letr, as comsumer.
iv. 9 involvd reversl of 2 letrs, as govenrment.

Here again, we need to ask wy som kinds of singl-letr misprint ar so much mor comn than othrs. For our purposes here we shal disregard th possbility that som letrs of th alfabet may be mor conduciv to misprints than othr letrs, in particulr that ajacent kes on th kebord may be confused (e.g. in continiue, <i, u> ar ajacent, and in comsumr, <m, n> ar ajacent); th tendncy for such errs to ocur may hav lesns for kebord desyn, but not obviusly for speling reform. It then apears we can perhaps establish an obtrusivness-gradient within this categry of err, just as was implyd in §2.3 between th difrnt categris. We may surmise that letr-reversl is least comn since by afecting 2 letrs it is most obtrusive wheras omission, insertion or substitution only afect 1 letr and ar therfor less obtrusiv; and we may surmise that omission is th least obtrusiv form of misprint, since it introduces no unfamilir letrs.

3.2 Lesns for speling reform, especialy CS.
Now one of th fundamentl tenets of Cut Speling (CS) is that a speling reform wich chiefly only omits redundnt letrs wil be visuly far less disruptiv than a reform wich substitutes letrs. Visuly disruptiv is howevr a synonym for obtrusive and it therfor coms as no surprise that misprints involving th omission of a singl letr shud be substantialy mor comn than othr kinds: they ar less esily noticed, and so less likely to be corectd. Th fact that th insertion of a letr was th secnd most comn misprint is of som relevnce to th bakwrds compatbility of CS, since it shos that words with extra letrs in them ar not too disruptiv for th readr; in th same way t.o., with its aditionl redundnt letrs, wud hav to apear not too disruptiv to children taut CS. And th fact that substitution and reversl wer least comn among th singl-letr misprints sujests that these kinds of speling-chanje ar th most disruptiv for th readr, because most obtrusive It was also noticebl that altho CS usuly cuts out undr 15% of letrs from t.o., over 28% (14) of th 48 singl-letr omissions wer letrs that ar cut in CS, as shown in brakets here: w(o)uld, register(e)d, Pen(n)ine, manag(e)ment, ac(c)omplishment, w(h)inges, non(e), Americ(a)n, W(h)itehead, bound(a)ries, se(e), damn(e)dest, Fol(l)ey, diagram(m)ing. This again suports th notion that CS is a 'natrl' procedure since it implys that careless riting has som tendncy to omit th same (redundnt) letrs as CS, rathr than simply omiting letrs at randm. This observation givs us a first reasn to think that th introduction of CS myt reduce misprints as wel as mispelings.

3.3 Late position of missing letrs.
Our next observation concerns th position of missing letrs in words. Since letrs ocuring toward th end of words ar less promnnt, hence less obtrusive than those ocuring nearr th begining, we myt expect th missing letrs to ocur on avraj nearr th end than th begining. Our 48 words printd with a letr missing contain a total of 394 letrs in t.o. If we then count th positions of th missing letrs in al th 48 words (i.e. th missing <a> in Americn is th 7th letr out of 8, and therfor caris a scor of 7) and ad them up, they total 244. Th mean length in t.o. of th 48 words (394 divided by 48) is just over 8.2 letrs, and hence in ther misprintd form only 7.2, but th mean position of th omitd letrs (244 divided by 48) is 5.08; in other words, th missing letrs tend to ocur in th midl or secnd haf of words mor ofn than in th first haf. We note that CS Rule 2, wich cuts vowl letrs in many post-accentul sylabis, by definition also afects th ends rathr than th beginings of words. Here again, we hav a hint that CS may help reduce a certn categry of misprint.

3.4 Mor letrs omitd from long words.
One wud expect longr words to be mor prone than shortr words to misprints by letr-omission, both because longr words hav mor letrs to lose and because a singl-letr err wud be less obtrusiv within a long word. A very ruf avraj word-length in th Guardian was establishd from a 370-word articl in wich th median word-length was found to be 5 letrs (i.e. rufly as many words had mor than 5 letrs as had fewr). But this figr by itself wud produce a biasd result, because if letrs wer omitd at randm, th larjr numbr of letrs in longr words wud inevitbly mean that most omissions ocurd in longr words anyway. To compensate for this bias, a difrnt median word-length was calculated, based on th total numbr of letrs in th 370-word text, wich proved to be 1945. Wen this was divided by 370, it produced a median word-length of 7 letrs (i.e. rufly as many letrs ocurd in words with over 7 letrs as ocurd in words with undr 7). This means that if letr omission was randm, th median length of word in th list of misprints wud hav been 7. In fact it was 9, with th foloing distribution: in th 48 words a letr was lost in

one 3-letr word
thre 6-letr words
nine 9-letr words
thre 12-letr words
five 4-letr words
four 7-letr words
eit 10-letr words
one 13-letr word
five 5-letr words
four 8-letr words
four 11-letrwords
one 14-letr word

With th norml caveats about th size of th sampl, this distribution sujests that, presumably for reasns of unobtrusivness, misprints ar mor likely to arise from letr-omission in longr words than in shortr words. If that is corect, we here hav a third indication of a positiv efect of CS in tending to reduce misprints: in CS th avraj word-length is reduced, ther ar fewr letrs that cud be omitd, and any omission wud be mor obtrusiv in th shortr word, mor likely to be noticed, and mor likely to be corectd.

3.5 Singl-letr insertions.
Th abov observations concerning th prevlnce of letr-omissions in long words ar furthr suportd by th same tendncy in th case of letr insertions. We wud expect a roge aditionl letr to be less obtrusiv in a long word than in a short word - and so it proved. Th mean length of th t.o. form of words in our 31-word sampl containing extra letrs was 7.2, and th misprintd length therfor 8.2, compared with th median of 5 calculated in 3.4. Again, th conclusion must be, longr words ar mor likely to atract extra letrs in misprints than shortr words ar, because th extra letrs ar then betr hidn. And because CS shortns words, it wud militate against such letr-insertions.


Tho having far less statisticl significnce even than th abov smal sampls of singl-letr misprints, it is worth noting that th 4 hole-word omissions in th corpus wer al very short function words: the, a, of, to. It wud seem th readr can skim text without necesrly noticing th absnce of such words. Perhaps this tendncy wud increse if th avraj length of word decresed, but th smal numbr of cases found in th Guardian corpus (a litl over 2% of th total misprints) sujests it wud be an insignificnt problm.


Wile th benefits of CS in reducing mispelings shud be quite dramatic, and wile at first syt misprints apear to ofr a far less promising area for improvernnt by speling reform, th analysis presentd here seems to sujest that CS myt somwat reduce th frequency of misprints too. Th improvemnt canot howevr be precisely identifyd as was possbl with mispelings: one canot say that CS wud abolish this or that typ of misprint. Rathr, it seems ther is a jenrl tendncy for misprints to hide away in th secnd haf of longr words, and if words ar shortnd as in CS, especialy by removing redundnt letrs in final sylabls, then statisticly this shud result in fewr misprints per 100 words of text. Wethr it wud result in fewr misprints per paje is a rathr difrnt question: since CS enables mor words to be printd on each paje, it is quite concevebl that th misprints-per-paje ratio wud scarcely chanje. This howevr rases a far profoundr question that wil one day need exploring too: wat implications dos mor economicl speling hav for th printing industry?

