[Journal of the Simplified Spelling Society, J19, 1995/2 p3-5.]
Roger Mitton reviews

Edward Carney A Survey of English Spelling.

London: Routledge, 1994, ISBN 0-415-09270-1, 535 pages, hardback, contains References and several indexes

1. Purpose of the book.

Benzene or benzine? Bromene or bromine? Well, if it's a hydrocarbon with a double bond, it's -ENE, and if it's an amine, it's -INE. I expect that's cleared it up for you. You'll find this snippet of information on page 432 of Carney's substantial work, under 'Homophonous affixes', along with a number of words where the ending has nothing to do with this distinction, such as kerosene, margarine, vaseline, codeine and gangrene.

Carney has set out to write a description of the English spelling system (reformers may feel there is a large assumption in that word 'system', but let it pass) as it is today. He has not written a history of English spelling, though he mentions history occasionally. Nor is he promoting a reformed orthography, though again he makes a few small suggestions for reform here and there. He strives to be neutral as between singing the praises of traditional orthography and lamenting its failings.

The book can be taken simply as a work of linguistic scholarship, but, if the author has a further motivation, it seems to be his concern about the low standard of educational debate on this topic. Teachers in England and Wales are required by the National Curriculum to teach their pupils that English spelling obeys rules but, while there is no shortage of opinion in this area, there has been a severe shortage of clear thinking and well-founded research. His book is an attempt to elucidate the rules of the English spelling system, though, apart from the occasional suggestion about how this or that aspect of spelling might be brought to the attention of pupils, he leaves it to others to draw out the educational implications of his work.

2. Analytical methods.

He considers briefly the possibility of analysing English spelling purely as a graphic system, ie, considering the patterns of written symbols in their own right with no reference whatever to the spoken language, but, though he judges this to be entirely possible and, for some purposes, advantageous, he moves quickly on to his main enterprise which is the mapping of the correspondences, in both directions, between spelling and pronunciation.

The bulk of the book presents the results of an analysis of a large word list. He combined two well-known word-frequency lists - the Thorndike-Lorge list and the American Heritage Dictionary list - with word lists drawn from three widely used computerized corpora of English text - the Brown corpus, the LOB corpus and the Louvain corpus - so his word-frequency figures were based on about 25 million words of running text in all. He boiled this down to a single list of about 26,000 separate words by lemmatizing the lists, ie retaining only the base forms of words and adding the frequencies of inflected forms to those of the base forms. For example, the final list would contain carry but not carries, carrying or carried; the frequencies of carries etc. were added to that of carry. He also included the pronunciation of each word by taking it from the Longman Pronouncing Dictionary. In a separate exercise, he analysed all the entries in the English Pronouncing Dictionary, many of which were proper names, though these did not have any frequency information.

He then developed a computer program which analysed each spelling into orthographic units - for example T, TH, E, EA, EAU might be units - and matched up units with phonemes. Letters did not have to be adjacent to constitute a unit; for example, the A ... E of fate could be counted as a single unit. In many cases, this matching of units with phonemes was a straightforward operation - dog, ship, lunch and so on - but a number of words presented problems. How should you split up sign, build, debt and the like? Reformers might be inclined to say at this point that the spelling of many English words simply doesn't correspond to the pronunciation - you simply cannot do what Carney was trying to do, and that is precisely what is wrong with traditional orthography. But he, obviously, did not take this view.

He expounds a set of seven principles which enable him to split these problem words in a reasonably consistent way. The B of debt, for example, is treated as an empty letter since it corresponds to no phoneme in debt or debtor or indebted or any other word with debt in it. The G of sign is counted as inert, rather than empty, since, though it corresponds to nothing in sign, it does correspond to something in signature. The U of build is regarded as part of the consonant, ie, BU is counted as a single orthographic unit, like the BU of buy and buoy and like the GU of guild. And so on, eventually tackling even the really weird ones like aisle and choir.

These orthographic units are invented, of course, purely to make the spelling match the pronunciation. Change the pronunciation and you change the orthographic units; BU, for example, is not an orthographic unit in an American's buoy - Americans pronounce buoy to rhyme with Hughie. If you have an orthography which does not match the pronunciation and you are determined to pretend that it does - and Carney has to in order to get on with the task he has set himself - then you have to resort to such devices. But at least he goes about it in a workmanlike manner.

3. Sound-symbol & symbol-sound correspondences.

Armed now with a complete set of mappings between phonemes and orthographic units, he can proceed to his main task which is, on the one hand, to take each phoneme and to list the orthographic units to which it corresponds and, on the other, to take each orthographic unit and to list the phonemes to which it corresponds. Since he has word-frequency information, he can say not merely that such-and-such a correspondence occurs, but also how often it occurs, taking account both of the number of dictionary words in which it occurs and also of their frequency in running text. He also presents groups of words showing how a particular correspondence occurs in words that share some common feature, such as one-syllable words with short vowels or three-syllable words with stress on the first syllable. At times he seems to play the role of an apologist for English spelling - "Look, traditional orthography is not all that bad; there's a pretty consistent pattern here" - but he is always scrupulous in pointing out exceptions to the patterns.

Taking an example more or less at random, the spellings of the phoneme /f/ are F, FF, PH and GH. The first is easily the most common, accounting for about four fifths of the occurrences. The last is the least common; it occurs in only a few words, but these words are common ones (enough, cough, laugh and the like), a good example of the value of having figures for both lexical frequency (how many dictionary words) and text frequency (how often in running text). The spelling PH is reasonably regular in Greek-style words. He is not suggesting here that you need to know Ancient Greek in order to spell these words, or even to know that they are derived from Greek at all, but just that you could recognize a group of words as having a family resemblance and, having guessed that a word belonged to this family, could prefer PH to F when spelling it. The reason for the family resemblance is, of course, that they are mostly derived ultimately from Ancient Greek; the history of English spelling tends to force itself into his description despite his efforts to keep it out. He admits the PPH of sapphire as irregular but brushes aside the PH of shepherd - what we have here is an ordinary P followed by an inert H, not an irregular PH. There follow some lists of words showing single or double F; words with a final /f/ after an unstressed syllable, for example, are regularly spelt with FF - bailiff, tariff, dandruff etc.

4. Rules for computers or rules for people?

In the 1960s a team at Stanford University led by P R Hanna catalogued all the sound-to-spelling correspondences in one of the Webster dictionaries and put them into a set of rules for a computer to follow in generating spellings on the basis of pronunciations. The intention was to demonstrate the extent to which English spelling was predictable on the basis of (in this case American) pronunciation. The rules said that if such-and-such a phoneme came in such-and-such a place (eg, at the beginning of a word or at the end of a syllable), it was most likely to be represented by such-and-such a spelling. The computer could then be given a string of symbols representing the pronunciation of a word and it would generate its best guess at the spelling.

Carney is well acquainted with this work; he presents an excellent critique of it. But although he has several telling criticisms to make of the Hanna project, it seems to have served in general terms as a model for his own, or at least for the sound-to-spelling part. He feels that Hanna et al. did not make a good job of it, the rigid scheme that they adopted preventing them from making use of many obvious regularities in English spelling. Yet, like Hanna, he goes through his word list phoneme by phoneme, showing how this phoneme corresponds sometimes to this unit sometimes to that one. He differs in detail in his analysis of how the phoneme's immediate surroundings can help you to prefer one correspondence over another, and he allows himself to make use of concepts outside the range of the Hanna algorithm - he has special rules for words with Latinate prefixes, for example - but the parallels with Hanna are sufficiently close for him to be able to include Hanna's rules for comparison with his own (generally of course to the benefit of the latter). He doesn't incorporate his own rules into a computer program for generating spellings from pronunciations, but you feel he could if he wanted to.

The point I am getting round to is that this great collection of patterns that he has so painstakingly extracted from his material seem to me to be more appropriate for computers than for human beings. As the basis for a program along the Hanna lines, they would be very useful; as material for teaching, I am not so sure. It may be that these patterns make explicit the orthographic knowledge - the 'feel' for English spelling - that accomplished users of written English have acquired from years of practice; if asked to write the non-existent word grandiff, I would spell it with FF in accordance with the rule mentioned above. But if schoolchildren are to be taught that English spelling obeys rules, then presumably these hundreds of patterns, with their sub-patterns and their exceptions, filling 120 pages of his book, are the rules in question. If so, then heaven help the schoolchildren, and their teachers.

The same point can be made, and with greater force, about the rules that go the other way - from text to speech. He begins this with a description of two early text-to-speech computer programs, the kind which tried to segment a spelling into orthographic units and then to produce a pronunciation for each unit on the basis of some rules. He is not presenting these as examples of good text-to-speech technology, which has moved on a good deal since these early efforts, but rather finds in these systems a model for the presentation of his own rules. This time he goes through the alphabet and. for each letter, presents the various ways in which it might be pronounced.

If we look at G, for example, we find thirteen rules (the vowel letters have far more). The first two tell us that GG and GH respectively almost always correspond to /g/ (haggle, gherkin). Rule 3 tells us that GU corresponds to /gw/ when the preceding letter is N and the next two letters are a vowel followed by a consonant (language, penguin); rule 4 that GU corresponds to /g/ when the next letter is a vowel (disguise, intrigue). And so on to rule 13 which you apply if none of the others apply, namely that G corresponds to /g/ (bogus, gurgle). For each rule, figures are given for how often the rule applies and exceptions are listed. Names are listed separately from dictionary words.

Two features of this presentation show how it is more suited to computers than to humans. The order of the rules is important, and it is counter-intuitive. In the above example, you'd get penguin wrong if you applied rule 4 before rule 3. You also have to remember that the rules are applied left to right as you work your way through the word. To quote Carney's own example of this, if you are going to get build right, you have to apply the rule that says that BU corresponds to /b/ when followed by a vowel, before you apply the rule that says that UI corresponds to /u:/ (as in bruise). So it is no good looking at a rule in isolation. You really need to carry the whole system in your head at once; computers are good at this, people aren't. As to the order being counter-intuitive, if you were describing the correspondences of G to someone, you'd surely begin with rule 13 - G corresponds to /g / - and then describe the exceptions.

5. Orderly or messy?

There is much else that is of interest in this book apart from the two main central sections that I have concentrated on in my review. He has something on spelling and accent, on the methods that playwrights have employed for indicating dialect, on the kind of spelling rules taught by schoolteachers and on the sorts of spelling errors that people make. He has chapters on homophones and homographs, on the spelling of names and, finally, on spelling reform.

If you have a particular interest in English spelling - if you are a spelling reformer, say, or a psychologist probing the cognitive processes of spelling - this book will be an invaluable resource. Whether it has much to offer schoolchildren and their teachers, I am not so sure.

Where does it all leave us regarding the rules of the English spelling system? I suspect it leaves us where we were to start with. If you thought that English spelling was not too bad and that we really ought to be wary of tinkering with it, you will find comfort in the large numbers of words grouped together in orderly fashion like soldiers drawn up for inspection. If you thought that English spelling was a mess, you will sigh at the convoluted systems of rules and sub-rules set up to describe this or that aspect of it (eleven rules just for the doubling of consonants), you will shake your head at the number of exceptions to this or that rule and you will conclude that it is indeed a mess.

