[Journal of the Simplified Spelling Society, J31, 2002/2, pp4-8]

English and Its Literemes.

Ralph Emerson.

An Update of the Author's 28 page article in American Speech.


This paper summarizes the major points of the author's 1997 article "English Spelling and Its Relation to Sound" in American Speech 72, pp. 260-88. Surface orthography is shown to represent underlying units called 'literemes', which have a flexible relationship to dialect.

1. Introduction:

My interest in English spelling came about because of my interest in English dialects. It surprised me that we have many dialects in English that all use very different sounds, and yet we have a single system of spelling, which, whatever its failings, is clever enough so that every English speaker who uses it believes that it reflects his own dialect. When any of us reads a word like rain, our mind's ear hears it spoken in our own voice and accent. And this applies to every written word: each of us fondly imagines that its spelling represents our own pronunciation of it, our own private English. That may be one reason why most people hate the idea of making any changes in spelling: it would seem like an assault on a personal possession.

2. Your accent or mine?

In the same way, it can seem like a personal assault if we hear someone speaking in an accent we don't like. That experience is mythologized in My Fair Lady, in which Eliza Doolittle wins hearts and achieves success by losing her supposedly offensive Cockney accent. After years of saying "the rine in spine" (as the script has it), she finally learns to say "the rain in Spain," and the play ends happily. That episode seems very simple, but if we look at it more closely, it reveals many things about letters, speech, and the nature of the connection between them.

If we assume, as I said, that conventional spelling represents our own pronunciation of the language, then for most Anglophones, the spelling of rain represents the phones [rein] Eliza's Cockney pronunciation [rain] is so far from that that it sounds like a whole different word, which most of us would spell the same way the script does - rine, like brine. Note that Eliza herself wouldn't write it that way: to her, the pronunciation [rain] just means rain; only non-Cockneys would represent it as rine. A spelling like rine thus uses the letters as impromptu phonetic symbols. It's 'Symbolesque'. And like many Symbolesque spellings, it makes sense only to readers with the same accent as the writer, so I'd mark it with an * for 'accent-specific': "The *rine in *spine."

Conventional spelling is the opposite of accent-specific. Snobs may think conventional spelling really represents only "good" accents, and that rain is thus "really" [rein], but they're wrong. Conventional spelling (†) belongs equally to everybody. Whatever your pronunciation of the word rain is, †rain represents it. Accent-specific spellings are keyed to only one set of sounds in one particular situation, but conventional spelling is phonetically amorphous:
*rine Cockney [rain]-for-rain heard by standard speaker.
†rain standard [rein], Cockney [rain], Scotch [re:n], etc.
Of course, the sound-value of conventional spelling is not entirely amorphous. We can pin it down in several ways - with rhymes, for example. As we've seen, rhymes may mislead across dialects: if you're Cockney, your rain will rhyme with my brine. But that would never happen within our respective dialects. For each of us separately, rain would only rhyme with words like these:
gain, main, vain...
rein, vein...
cane, mane, sane...
Because everybody agrees that all these words rhyme, something obviously unifies them all. Yet the unifier can't be a sound, because the specific pronunciation of the rhyme isn't stable; and it can't be a spelling either, because the same rhyme is spelled in three different ways. So what I proposed in my article was an imaginary unifier, one that could be viewed in two ways, either as an abstract letter ('litereme') or as an abstract sound ('graphophoneme'). Written in literemes, rain is <<rAn>>; in graphophonemes it's //ren//. The litereme symbols are based on spelling, and the graphophonemes on phonetic symbols. They are two ways of portraying the same abstract entity, different but perfectly equivalent.

Literemes "embody the systemic intent of the letters as they are used in the spelling of a particular language" (284). In rain and its rhymes, the "intent" of the vowel spelling is traditionally summed up as 'long a', so I write the litereme as <<A>>, or more conveniently, 'A' (using caps instead of the earlier article's macrons). Each litereme is paired with a single graphophoneme, an equivalent phonic symbol "distilled from the [litereme's] sturdiest phonemic realizations in a sampling of dialects" (265). Long a's characteristic sound can be "distilled" into //e// (as I wrote it before, thinking of cardinal [e]). Poised between messy clusters of real spellings and real sounds, the two mediating abstractions of litereme and graphophoneme unify everything one-to-one:
graphemes: <-ain, -ein, -ane>
[ei, ai, e:] phones
Now, if we follow up the implications of that scheme, the first thing we see is that "conventional spelling ... represents ... literemic structure" (281). The literemes are a house; spellings are the coats of paint on it. Rain's rhyme '-An' can be painted -ain, or -ane, or prettied up as in deign or Wayne, but it's still the same house. (Just as the rhyme can be pronounced in any dialect, and it's still the same house.) Literemes are the real deal in simplified spelling - the true orthographic essence of our current conventional spelling.

3. Symbolesque versus literemic.

In light of that, let's re-examine rain, rine, and brine. The conventional spellings †rain and †brine are costumes for the literemic forms 'rAn' and 'brIn', but the Symbolesque *rine is nothing but one man's attempt to represent the specific phones [rain]. Do you see how qualitatively different they are? *Rine is an ad hoc, spur-of-the-moment attempt to stop sound in its tracks: I hear a Cockney say the sounds [rain] and I write them down as rine according to my personal understanding of how the letters work. Somebody else with a standard accent like mine would see my rine and be able to reproduce the sounds I was trying to show - "Oh, that's [rain]." And if they knew the context, they'd be able to add, "Yeah, that's how Cockneys say †rain. Ha, ha." That's what's happening in My Fair Lady.

But spelling doesn't come with labels like *rine. If I write out rine on a slip of paper and show it to other standard-accent speakers, they'll pronounce it with exactly the sounds I intended, because their understanding of spelling happens to be the same as mine; but if I show the slip of paper to an American Southerner, she'll pronounce rine as [ra:n], and if I show it to a Cockney like the unreformed Eliza, she'll say [roin]. This is "the great paradox of alphabetic writing: users set it down believing it to be concrete, but as soon as their backs are turned it melts into abstractness. Alphabetic writing always begins by representing specific sounds and always ends by representing pools of sounds" (282).

*Rine for [rain] is an example of orthography in all its intended purity, letters directly representing sounds. But the purity comes at a price: it's only applicable to me the writer and those who talk just like me. Of course, those are the conditions under which alphabetic writing began. Over twenty-five centuries ago in ancient Greece, tiny communities of people who all spoke alike began transcribing the sounds of their own speech into letters they had learned from Phoenician traders. The Greeks called the country of their benefactors [phoi'nike], for example, and to write it they spelled out each sound as Φ-O-I-N-I-K-H. When the Romans borrowed the name a few centuries afterwards, they kept the Greek pronunciation but wrote it PHOENICE in their own alphabet.

Two thousand years later, most of Europe still spells that name virtually the same way - Phoenicia, Phénicie, Phönizien - but the modern pronunciations all sound more like the Spanish version Fenicia than the ancient [phoi'nike]. For sounds always change: the Greek phi for aspirated p has blended into Latin litereme 'f'; the letter c has taken on some of the work of Latin 's', and so on. In those little Greek towns where everyone spoke alike, the first spellings all had the purity and immediacy of *rine for [rain]; but the further afield the spellings went, the vaguer their relation to the original sounds became. "All orthographies begin as Symbolesque, but once a representation becomes conventional, it becomes literemic and loses touch with the actual sounds it was intended to record" (282).

Yet whatever conventional spellings lose in purity, they gain in reach and power. If you pronounce the Symbolesque *rine in a Mississippi accent as [ra:n], it loses its whole point. But if you pronounce a conventional spelling like †brine as [bra:n], you don't hurt it a bit, because its real identity is no longer in its sounds but in the literemes 'brIn' - and those can be pronounced in any accent without losing their integrity. Symbolesque spelling is phonetic; conventional spelling is literemic.

What's the cut-off between the two? I guess a spelling becomes conventional when everybody agrees that it's conventional - maybe when it makes it into the dictionary. Greek and Latin spellings have been conventional for millennia now, and other English spellings have mostly been conventionalized since the mid-1600s. Yet new conventional spellings arise from Symbolesque origins all the time. Many of them begin just like *rine, as ad hoc spellings for dialect versions of existing words. For instance, so many early Americans used the vowel [ai] in the words †roil and †hoist that they gained by-forms spelled *rile, heist. When mainstream English adopted those as synonyms for 'provoke' and 'hold-up', they gradually achieved conventional status: *rile slowly became †rile, a new word beside the older †roil. While that process took years, other Symbolesque spellings are instantly deputized as conventional when new words are invented out of thin air, like dweeb, or when foreign loanwords are spelled out English-style, like savvy for Spanish sabe.

4. Literemes and their spellings.

How does a literemic critique shed light on present-day English? I think it illuminates three things: first, that our spelling represents its own structure inefficiently; second, why that structure coordinates imperfectly with the individual structures of modern dialects; and finally, why the English use of the alphabet is out of kilter with the rest of the world's.

The crux of English spelling is the contrast between the long and short vowels, between rain and ran, 'rAn' and 'ran'. The ten literemes involved - five long and five short - account for the intent of at least 90% of the vowel spellings we see in any sample of text. Only four other vowel literemes are native to English: 'oi, oo, ou' as in boy, toot, count, and 'au', which is spelled au, aw, or a(l), as in taut, taught, law, all, talk. There are no literemic schwas, since spellings per se make no distinction between stressed and unstressed vowels.

The main weakness of English spelling - an astounding weakness - is that it is literally not equipped to mark that crucial difference between long and short. Conventional spelling does not mark its long vowels with Unifon caps (rAn), or New Spelling's e-digraphs (raen), or Valerie Yule's elegant grave accents (ràm). It has no consistent method at all, just a jumble of silent e's (cane), dubious digraphs (rain, rein), and positional uses of the single vowels (contempor<a>neous).

As my earlier article explained, the English long/short contrast is ultimately rooted in orthographic syllables. English long vowels naturally occur in 'open' syllables - those in which the vowel itself is the last letter (go, be, hi). By contrast, short vowels are always glued to a following consonant, which creates a 'closed' syllable (got, bet, hit). Simple closed-vowel spellings like ran can only represent 'ran', not 'rAn'. That's why long-voweled closed-syllable words like rain and cane need their digraphs and silent e's.

In words of more than one syllable, the simple open/closed contrast is at work as surely as it is in go and got. Thus go makes going, with the syllables divided go.ing. One vowel coming right after another is a sure sign that the first vowel is long: go.ing, sto.ic, ide.a. By contrast, the sure sign of a short vowel is a double consonant like -nd- in ten.der, thun.der One consonant repeated serves the same purpose, like -tt- in gotten. Either way, the preceding vowel is marked as short: gott.en, syll.able, bann.er. If a double consonant in the middle of a word marks a short vowel, then presumably a single consonant marks a long vowel? Sometimes: o.ver, vi.tal, na.tive. But just as often it does not, for the double-consonant rule is not consistently applied, and probably half of all short vowels precede a single consonant. Thus we have short sev.en beside long e.ven, viv.id beside vi.tal, the verb pol.ish beside the adjective Po.lish. Although the intended vowel length becomes clear when we put in the little dots ("after the fact," as people have commented), the bare spellings themselves cannot tell us whether a vowel is long or short. Long vowels in polysyllables literally have no marker of their own. It's as astounding as the Titanic going to sea without enough lifeboats: as crucial as the difference is between its long and short vowels, English spelling is simply not equipped to mark it.

Obviously, the situation is not hopeless. Good guidelines exist for guessing whether a vowel is likely to be long or short in a particular position within a word of a particular etymology or morphemic structure. My earlier article explicitly laid many of those guidelines out. Most of them are simple enough for literate people to internalize unconsciously; and the degree to which they are internalized successfully is shown by the otherwise inexplicable fact that the whole English vocabulary, allowing for dialectal differences, has an almost entirely stable and agreed-upon pronunciation. If the spelling-sound relationship were truly chaotic or truly impenetrable, pronunciation would be a free-for-all. Still, simply because something is getting done does not mean that it is being done well. If English cannot clearly mark the vital contrast between long and short vowel literemes, then it's doing a bad job of representing its own internal structure.

As I said, spellings can be thought of as paint on the literemes' timbers. An efficient paint job clarifies the structure underneath, and English has a mere whimsical patchwork instead. The literemes 'A + n' can claim half a dozen spellings: cane, rain, Wayne, deign, Maine, campaign. Or a single spelling like g can claim several literemes: sometimes it represents 'g' (go, big), sometimes it's 'j' (gem, huge), and sometimes it stands for nothing at all (campaign, gnat). A handful of English spellings like who are outright lies in that regard ('wh + O' for 'hoo'). The literemes are fine: 'rAn, gO, big, jem, hUj, nat, hoo'. What gives English orthography such a bad name is that the literemes are so poorly expressed by the surface spellings. A more efficient orthography would at least make the literemes' identity clear in each word; a really ruthless one would spell them alike in every situation.

I keep claiming "Spelling represents literemes, spelling represents literemes," and you might justly ask, "If spelling is so vague, then how do we know what the literemes are?" By listening. In English we have to listen as well as look to know what the spellings mean. I know that heaven is 'heven' and reason is 'rEzon' because I know how they're pronounced; then I filter the pronunciations back through the spellings to find the "intent'" of the spelling in each case. Specifically, the pronunciations tell me the value of the mutable ea in each word's stressed syllable, and the spellings tell me the literemes behind the schwas in the unstressed syllables: short 'o' in reason and short 'e' in heaven.

A word can have several literemic interpretations over time and space, and sometimes we need to hear a word said aloud just to see which interpretation a particular speaker intends. We interpret †reason today as 'rEzon', but to Shakespeare it was 'rAzon', an irresistible pun for raisin 'rAzin' (Henry IV, Part I). Speakers today may choose between 'E' and 'I' in †either. The literemes of †close depend on whether it is interpreted as a verb or an adjective, 'klOz' or 'klOs'. The varying literemes in each word depend on the innate instability of one segment in the spelling: the letter s may be hard or soft; and vowel digraphs like ea and ei rotate among literemes like cats among armchairs.

Dialectal context is vital too. I need to know who's speaking before I can say what literemes they're invoking. If I'm sure I'm listening to a Cockney, I'll know that [rain] is 'rAn' rain; if I'm listening to a standard accent, then it's the river 'rln' Rhine. English has a lot of cross-dialectal homophones, and they're fun to find: the way the British say paired sounds like the American pronunciation of pad; last sounds like lost; cart like cot. Of course, real confusion seldom occurs because words like those are never uttered in isolation, only in contexts that make their identities clear. But the vagueness is there just the same. It is the price we pay for subsuming all our dialects into a single mass called English and daring to give it all a single collective spelling.

5. A pan-dialectal solution.

That is not the only way to handle the problem. Generally, when languages grow large and split up into dialects, each dialect gets its own spelling. The overwhelming example in the West is how Latin split up into the different Romance languages. As the local street Latins of Italy, Spain, and France gradually transformed into separate regional dialects, the pronunciations changed too, and with them the spellings. When the Roman colonists of Spain and Italy began to soften the sound of †lacus 'lake' into ['lago], they started writing it that way too, at first merely as a Symbolesque spelling, "our pronunciation" *lago, and then later, when the dialects had matured into national tongues, as the legitimate conventional form: "our word" †lago. In France, the same process produced †lac.

Comparably huge changes have happened in English dialects, but the English way of handling them has been to let people say lago or lac as long as they kept writing †lacus. The way we write English today represents a somewhat older form of our language - with respect to the spellings themselves, the English of the Middle Ages; with respect to the literemes, perhaps the English of the early 1700s. That was the last time that the phonemes of most dialects still had a consistent one-to-one relationship with the literemic/graphophonemic segments.

Modern dialects mostly represent branchings from 1700-style English, separate developments from it. Pronunciation differences in 1700 tended to follow literemic differences. Modern pronunciations tend to confound such differences, with different dialects confounding them in different ways. The words cot, caught, and court, for example, are all literemically distinct, 'kot, kaut, kOrt', and accordingly, they had three separate pronunciations in 1700. They still do along America's Atlantic coast, but in the rest of America today, two of those words have merged phonemically: caught = cot /ka:t/. A different two have merged in Britain and Australia: caught = court /ko:t/.

It's only the phonemes, however, that have changed in each case. The spellings and literemes remain as distinct as ever. If the spellings were changed to accommodate one of the newer dialects, it would confuse the issue for speakers of the others. The existing conventional spelling, while it does not precisely match anyone's specific phonemes today, does serve as a reference form that everyone can use in his or her own way. Each dialect simply has its own (relatively predictable) ways of interpreting the literemes, and different dialects co-exist on those terms.

My earlier article described the most important litereme-to-phoneme interpretations in modern dialects - especially those involving r's and the vowels before them, which is where the most changes have taken place in the last few centuries (271-73). Most dialects in 1700 still had a post-vocalic r, and the presence of an r didn't affect the sound of a vowel much. That's exactly what we'd expect from looking at spellings like pain and pair. Again, let me stress that our spellings, literemes, and graphophonemes in 2002 are still exactly what they were in 1700:
p + ai + n'pAn'//pen//
p + ai + r'pAr'//per//
But three centuries ago, the actual pronunciations typically matched the graphophonemes segment for segment, /pe:n, pe:r/. In a handful of modern accents, like those of Scotland and parts of the Caribbean, pronunciations still do match that closely; but in all other places they have shifted a great deal. Most importantly, vowels before historical r have usually laxed to the point where they no longer match the values of comparably spelled vowels in other positions, and most British accents have furthermore dispensed with the r's themselves, so our two words now sound much less alike than they once did:
//per///pe:r/ /pɛr/ in US
  /pɛ ə/ in UK
The spellings pain and pair are the only obvious point of resemblance left. "The simple universal phonology of written English gives birth to the infinite particularities of spoken English" (267).

6. Latin again.

When Spanish and French rewrote their local versions of Latin lacus as lago and lac, they changed the spelling to accommodate new pronunciations, but they weren't changing the values of the letters themselves. An ancient Roman girl seeing lago or lac would say them just like a modern Spaniard or Frenchwoman: ['lago, lak]. She would also say the Japanese loanword sake 'wine' correctly, ['sake]; but she would miss on English lake [leik]. English speakers somehow manage both:
Sake 'wine'/'sa:kei/
Doing so involves real doublethink. Besides the long and short values of the five vowel letters, modern English speakers who pronounce a loanword or foreign name are contending with a third set of literemes that reflect the vowels' original Roman values //a, e, i, o, u//. Those remain the usual values in every language except English. Printed in outline below, they relate to the native English literemes like this:
Euroesque and Britannic valued for the vowels
My earlier article called these respective values 'Euroesque' and 'Britannic' (282-83: Euroesque //a// lacks a consistent equivalent). As much the a and e in the Euroesque sake 'wine' look like the a and e in lake, they are really another order of being altogether. Comparing the top and bottom rows of the table above, we see that sake's a will be Continental, while its e will sound like English 'A':
Words shown in Euroesque and Britannic
As I have been suggesting all along, spellings are never concrete - they are always subject to competing analogies of interpretation. In dialect writing like *rine for †rain, the competing analogies involve different accents; in loanwords, the competition is between homegrown analogies and international ones. Euroesque words in English are a large class that's daily getting larger, and we encounter them so often now that our doublethink about them has become almost automatic. The names Tina, Rita, and China all seem perfectly at home in English, for example, yet of the three, the only Britannic one - the only one pronounced according to a truly "English" analogy - is China. That's because it has been in English much longer than the others. New words and names coming into English from elsewhere, like sake wine and Rwanda, automatically join the Euroesque club. It's not the spellings per se that make them Euroesque, only the interpretations. Sake interpreted as Britannic is the /seik/ in for heaven's sake.

The Britannic words in English are the old ones, the Anglo-Saxon, Latin, and medieval French words that were already in the boat when the Great Vowel Shift came along, twisting our vowel graphophonemes so hard that they snapped and cut us loose from the rest of the world's spelling. Will we go back'? Are the legions of Euroesque spellings we see around us today in our readings and travels the emissaries sent to encourage us to return? If we go, there's much we'd have to leave behind. A Euroesquely spelled English would remain literemic, because all orthographies are, but its relation to the English we have now would be more like the relation of French or Italian to Latin - not the same, only a descendant, the genes diluted and updated for a new age.

