Heterographs in English Part 1.

Christopher Upward.

It is often assumed that in a rational orthography all homophones would be spelt alike. However this view is also challenged, both because homographs might be confused, and because the public is expected to be hostile to the resulting mergers. This article attempts to establish just how far-reaching the phenomenon of heterographs is in English.


1.1 Homo- & heterographs, heterophones.
The vocabulary of English contains many sets of more or less distinct words with more or less the same form. One cannot say exactly how many exist, but by the most conservative count there are well over a thousand, while the broadest definition might embrace many thousands. The phenomenon is complex because words overlap in several ways, as described by the terms: homophone (same sound), homonym, homograph, heterograph (different spelling), heterophone; their usage however is often inconsistent and therefore ambiguous. There are three kinds of overlap between words: firstly, sets such as the verb, noun and adjective tender (to tender one's resignation, a locomotive tender, tender feelings) in which spelling and pronunciation both coincide; secondly, sets like pair: pare: pear, which are pronounced the same despite different spellings; thirdly, a dog's lead and the metal lead, which are pronounced differently despite their identical spelling. In this article the terms are used follows:

pair, pare, pear
tender x 3
lead x 2
= Heterographs
= Homonyms
= Heterophones
= Homophones
= Homophones, Homographs
= Homographs

1.2 One grapheme for one sound?
Although homonyms like tender are inherently ambiguous, they are not generally considered a spelling problem as the grammatical and semantic context usually makes the meaning plain. On the other hand heterophones and heterographs are felt to epitomize the defects of t.o., conflicting as they do with the basic alphabetic principle of one-to-one correspondence between sound and spelling. But while English has rather few heterophones, and most could readily be given distinctive spellings (e.g. lead: led), there are so many heterographs that to give all words in a set the same spelling would add massively to the ambiguity of written English.

1.3 Conflicting needs of readers, writers.
Here the needs of writers and skilled readers may conflict. For writers it is convenient to derive spelling from pronunciation (which native speakers at least can usually recall at will), but it is inconvenient for them to have to recall and distinguish different spellings for homophones according to meaning or grammatical function. For the skilled reader, on the other hand, who does not derive the pronunciation from the letters but recognizes the global appearance or gestalt of words, it may be useful if meaning, rather than the pronunciation, is immediately apparent.

1.4 How much danger of ambiguity?
It is often said that since we rarely confuse homonyms in speech, the danger of confusing them in writing must be equally small. However there are differences in how speech and writing are perceived which cast doubt on this assumption. For while the hearer registers tone of voice, gesture, facial expression and other audible and visible signals which clarify the message, the reader lacks these aids. Furthermore, speech is more likely to provide a context, which in writing may be largely absent, as on signboards and in headlines. Misunderstanding may be unlikely if only one word in a text is ambiguous, but when two or more are, the risk is greater. In a recent headline, CHECK ON RACE FOR TEACHERS, the meaning is obscured by two ambiguous words, check and race; but if the headline had read delay on race for teachers or check on racism for teachers, the meaning would have been clear. Significantly multiplying the ambiguous written forms in English by merging the spelling of all heterographs (including such crucial sets as one: won, to: too: two, for: fore: four) would increase the chance of misunderstanding substantially, indeed exponentially. An extreme instance of the confusion that might occur is Addison's objection to the proliferation of that: That that I say is this: "that that that that gentleman has advanced, is not that, that he should have proved... [1] (not that writers would find it easy to make a choice of spellings for different uses of that). We should perhaps cautiously conclude that the similar difficulty writers now experience in distinguishing their: there may be an argument for merging those heterographs that are particularly prone to confusion, but that does not mean that an across-the-board merger of every set of heterographs would not seriously reduce the transparency of written language as a whole.

1.5 Other languages.
Other European languages have more consistent orthographies than English, and it is natural to ask if they therefore confuse homonyms. Typically however they distinguish parts of speech and grammatical functions by means of inflections; thus while in English a march: to march are perhaps homonyms (or even the 'same' word), French la marche: marcher, German der Marsch: marschieren, Spanish marcha: marchar are quite distinct both in speech and writing. Ambiguity is further reduced in writing, in French by a fairly regular system of often silent inflections and in German by capitalizing nouns. However the few spelling distinctions based on grammar in t.o. are found decidedly unhelpful: such pairs as practice: practise, dependent: dependent are notorious spelling traps. This does not however mean that a system of such distinctions, regularly applied across the language, might not usefully minimize the ambiguity of homophones; Chris Jolly is now exploring this idea. Also relevant are perhaps Chinese and Japanese [2], in which homophones abound, indeed it has been stated that these prevent the conversion of those languages to the roman alphabet, since readers would then be unable to distinguish the many homonyms. Homophones may be far fewer in English, but we cannot pretend the potential for such confusion does not exist.


2.1 Homonyms.
Drawing partly on F R Palmer's [3] analysis, we note several kinds of homonym. The noun flight, for instance, has different, though related, meanings; fire may function as noun, verb or adjective, though derivation and meaning are, crudely speaking, the same; while tender represents three unrelated words. In none of these cases is it thought desirable to introduce spelling distinctions.

2.2 Heterographs.
The above categories of homonym are paralleled among heterographs. A few, like practice: practise, as we saw, make unhelpful grammatical distinctions and by analogy with fire could well become homonyms. A few, like flour: flower, have a common derivation and meanings which are perhaps as closely related as the meanings of flight; yet their written forms have become firmly differentiated. Merging their spellings again might be initially confusing if either word were to take the form of the other, a consideration which applies with greater force to the many quite unrelated heterographs like pair: pare: pear. On the one hand, the analogy of tender shows that separate words with the same spelling need not, by themselves, confuse; but on the other hand, if all three words adopted one of the existing forms, readers might at least initially be disoriented (e.g. pair this pair of pairs); a new form like per for all three might lessen the risk of such confusion.

2.3 Word-boundaries.
Another source of uncertainty is the variety of ways in which words may join together in English, sometimes giving homophones which do not exist if the base words are taken in isolation. Thus, are car-key and khaki homophones, and if so, should they be spelt the same? However, whatever curiosity-value such specimens have, their implications for spelling reform are probably slight, since existing word-boundaries clearly distinguish most such pairs, and (with the exception of Harry Lindgren's Phonetic B [4]) reformers are not proposing a systematic revision in this area. (But see §2.5 for the problem of apostrophes as word-boundary markers.) Nevertheless, in compiling the list of heterographs, some doubtful cases had to be considered - whether for example mistle (which occurs only in the fixed expressions mistle thrush, mistletoe) should be listed as a heterograph of missal:US missile, or wych in wych-elm as a heterograph of which: witch.

2.4 Morpheme boundaries.
(For some observations in §2.4, 2.5 I am indebted to Dr Adam Brown at Aston University.) Morpheme boundaries may also make us hesitate before deciding two words are heterographs. Thus there are sets ending in <r> in which one word is a base-word and the other a base plus suffix <r> (e.g. lair: layer, sear: seer, hire: higher, coir: coyer). In fact so productive is this suffix that more doubtful cases may readily be invented: cure: queuer, pair: payer. Similar pairs occur with other inflections: with past-tense <-d> mowed: mode, based: baste; with the <s> inflection laps: lapse, ads: adze; and <-ing> produces uncertainty after /l/, as with pedalling: peddling (does pedalling contain an extra syllable represented by the <a>, or are the two words heterographs, like pedal: peddle?). Even the <-ble> suffix permits an invented heterograph in cannable: cannibal. Whether or not such pairs would merge in a spelling reform depends partly on what morphophonemes are used for suffixes, and partly on the phonemic analysis of the forms concerned; thus if past tenses were formed with <d>, but the t.o. <s> inflections were respelt <z>, there would be no merger of based: baste or laps: lapse.

2.5 Heterographs with apostrophe in t.o.
If we are concerned with heterographs because of the problems they cause users, we cannot ignore the apostrophe, whose correct use in t.o. requires quite subtle analysis. Apostrophes often confuse patterns of word- or morpheme-boundary, and heterographs can make this confusion worse confounded.

2.5.1 Omission + word-boundary.
A common use of apostrophes is exemplified by the pair its: it's, which (perhaps particularly because of the high frequency of both words [5] and because users may feel its requires an apostrophe as a possessive) are often confused. The apostrophe in it's marks omission both of the <i> in is and of the word-boundary. So is it's one word or two? The widespread confusion of it's: its perhaps shows that this apostrophe is as unnecessary a grammatical distinction as that between practice: practise. Among heterographs of this type, they're: their: there are open to similar confusion (they too occur very frequently); but we also note the following: he'd: heed, he'll: heel, I'd: eyed, I'll: aisle: isle, we'd: weed, we'll: weal: wheel, we're: weir, we've: weave, who'd: hood (Scots), who's: whose, you'll: yule. All these contractions arise from the junction of a pronoun and a verb such as are, is, will, would. We may here also mention the archaic contraction of it as 't, which gave rise to the heterographs 'twill: twill.

2.5.2 Negative contractions.
At first sight similar, but additionally confusing for subtly different reasons, are forms like isn't. Here the apostrophe indicates omission of <o> in not, but the word-boundary lies elsewhere - if anywhere. In isn't, couldn't, hadn't, aren't the word-boundary precedes the <n>, but in can't, shan't, don't it has been disguised by other omissions (not indicated by apostrophe) and/or by changed pronunciation, while won't changes both spelling and pronunciation of the full form. The false analogy of the it's type, in which omission and word-boundary coincide, no doubt partly explains common misspellings like would'nt. Possible heterographs arising here are aren't: aunt (non-rhotic speech only), can't: cant (US etc), won't: wont.

2.5.3 Apostrophe with <s>.
Apostrophes are frequently associated with <s>, sometimes functioning as morpheme-boundary markers and sometimes not. Many writers fail to distinguish plural <s>, singular possessive <'s> and plural possessive <s'>, though words whose base ends in <s> (Moses') or which have an irregular plural (children's) may show other variations. In addition, we have already noted in §2.5.1 that the verb is can contract to 's, marking not possession but a word-boundary, and producing the heterographs theirs: there's, who's: whose, wise: why's... If every such <s> form counts as a heterograph, the number is almost unlimited, because every noun taking such endings then spawns a set. Such sets are excluded from the list, but <s> forms that are homophonous with other words (e.g. raise: rays) are included. The past-tense morpheme <d> may also take an apostrophe, as in ski'd, fee'd; of these, ski'd: skid are not heterographs because not homophonous, but fee'd:feed are.

2.6 Variation in pronunciation.
Accent differences mean that many sets are heterographs for some speakers but not for others. The rhotic:non-rhotic divide (between those who always pronounce <r> and those who do so only before a vowel) is a major source of such discrepancies, and is discussed in §2.7. Other cases abound, but are mostly confined to smaller speech-communities (see John Wells [6]). However, within an individual's speech too pronunciation often varies according to context (see David Brazil [7]), and a number of heterographs listed in §4 (e.g. precede:proceed) are only so in some contexts. We have included sets like shore: sure and those with initial <w, wh> (wail: whale), although some speakers distinguish them. In general we have tried only to include sets that are homophonous in General American or RP. Variation in pronunciation means that while merging a set of heterographs may often be sensible, sometimes it may actually distort potentially phonographic t.o. spellings. If therefore criteria are sought for limiting mergers, one might rule out spelling changes which produce forms that are less phonographic than t.o. for any group of speakers.

2.7 Heterographs with and without <r>.
For non-rhotic speakers some 54 sets constitute heterographs which for rhotic speakers are not so. These sets (listed here) are excluded from the main list because there is wide agreement that <r> should be kept in any reformed orthography where it occurs in t.o. (though may be simplified). However if sets contain more than two heterographs, those constituting a set for both rhotic and non-rhotic speakers are also given as such in the main list.

2.7.1 Closed syllables.
auk:orc -s
balmy:barmy -ier
bawd-s:board -s:bored
calve:carve -s d ing
calk/caulk:cork -s d ing
caw -s d ing.-core -s d ing:corps
colonel:kernel -s
ion:iron -s
laud:lord -s d ing
rabbit:rarebit -s
stalk:stork -s

2.7.2 Full vowel in open syllable.
baa:bar -s d
flaw:floor -s d
haw -s:hoar:whore -s
ma:mar -s
paw -s d ing:(poor:?)pore -s.-pour -s d ing
saw -s d ing:soar -s d ing: sore -s
yaw (-s):yore:(your -s)

2.7.3 Final shwa with and without /r/.
manna:manner -s.,manor -s
mina/mynal/mynah:miner/minor -s
panda:pander -s
rota:rotor -s
skewer:skua -s
tuba:tuber -s
tuna:tuner -s

Special case: formally: formerly


3.1 Sources
The list was compiled from sources given here in approximately defending order of importance. The most comprehensive was Hagan [8], substantially augmented by Terrell & Meadows [9]. The Oxford Dictionary for Writers and Editors [10] supplied a dozen or so more sets, a similar number was collected by the author, and a handful came from Chevenix Trench [11], with isolated sets from New Spelling [12], or suggested by Dr Adam Brown.

3.2 The list inherently incomplete.
Any such list must be incomplete, because the phenomenon itself firmly resists unambiguous definition. Pronunciation, inflection, spelling, all allow words to appear in a multitude of guises, sometimes aping each other, sometimes proclaiming their uniqueness, appearing sometimes as blood-relations, even as identical twins, sometimes as impostors, sometimes as so familiar that we rarely look twice at them in daily use, sometimes as out-and-out freaks whose very existence we scarcely credit. Our list aims to present words that the average well-educated native speaker will probably know of, and which, though usually spelt differently, are felt to have roughly the same pronunciation in their citation form or continuous speech.

3.3 Categories excluded from the list.
In addition to the categories already discussed, the following were normally excluded: proper nouns (e.g. Brest: breast, Philip: fillip) - though not nationalities (Finn: fin) -, colloquialisms and expletives (pa: pah), archaisms (wight: white), dialect or local words (hoo: who), highly specialized or technical terms (lac: lack), and foreign words (firn: fern). Inevitably the author constantly had to exercise his discretion, and readers may feel some words listed should be excluded, or vice versa.

3.4 Alternative spellings in t.o.
A little-remarked feature of t.o. is the large number of words with more than one spelling. Generally alternative forms have not been regarded as heterographs, but the distinction between different spellings and different words is not always clear. The following variations were usually ignored: divergent American/British spellings; diachronic changes like phantasy: fantasy; widely accepted alternatives, like the <-ise, -ize> endings) or gaol: jail; uncertain spelling of shwa (briar: brier, imposter: impostor); loan-words taken from languages not using the roman alphabet (lychee: litchi); and alternatives reflecting different degrees of anglicization (crape: crepe). However sometimes different meanings have come to be associated (often erratically) with different forms. Sets whose meanings are generally distinct today (curb: kerb, flour: flower, lightening: lightning, metal: mettle) were included but elsewhere the distinction may be subtle (enquiry: inquiry), or very recent or unfamiliar to many users; these cases have usually not been treated as heterographs (computer program: theatre programme; the faculty of judgement: legal judgment; a person who adapts as an adapter but a device for adapting as an adaptor). More complex because asymmetrically overlapping are the spellings/meanings of the following: swat: swot (we swat or swot a fly, but a student swots); calk/caulk (to seal a boat) and calk (a spiked undershoe, or to trace, the latter meaning cognate with the differently pronounced calque); and gibe/jibe as a sneer, but the gybe/jibe of a sailing ship. Even harder to classify are to stanch and the adjective staunch; as such they seem distinct, but the alternative form to staunch is a homonym of the adjective staunch. While we have excluded judgment: judgement from the list as essentially variants of the 'same' word, and stanch: staunch as either quite distinct or else as homonyms, we have included swat:swot, calk: caulk and gibe: jibe.

3.5 Multiple entries.
Some sets bear the number 2 or 3, indicating they are listed 2 or 3 times because their spelling differs in 2 or 3 particulars. Thus cede: seed 2 differ in the spelling of both the vowel and the first consonants while cedar: seeder 3 also differ in the second vowel, and are listed 3 times.

3.6 Inflections.
The entry beach: beech -s shows that both words can take the inflection <-s>, so the inflected forms make a further set. With verbs three extra sets can arise by inflection, as with gamble: gambol -s d ing; or only two inflected forms arise, as with brake: break -s ing. Alternatively, a set with more than two members may produce heterographs with the inflected forms of only some words in the set, as with air -s, ere, eer, heir -s. Discretion had to be exercised as to which words are thought capable of inflection, and since English uses words innovatively as various parts of speech, the fact that a set may be shown without inflections does not mean none could ever arise.

3.7 Arrangement of entries.
The list has two parts, §4.1 showing variations in vowels and §4.2 in consonants. The heterographs in each set are arranged alphabetically, the first word determining the position of the set in the list; thus the set cede: seed is listed under <c> and not under <s>. The phonemic analysis reflects the author's RP bias, and is designed to give a rough idea of relative frequencies of the phonemes occurring in heterographs, and not a precise phonemic breakdown. For typographical reasons it was necessary to compress more than one phoneme into some columns of §4.1, thus boy: buoy are listed with bawd, also merged are the vowels in lass: grass, bet: bear, bite: byre, bit: beer, cow: cower, brew: brewer. §4.2 on the other hand groups consonant phonemes by the least ambiguous alphabetic spelling, franc: frank being listed under <k>. Sets differing in both vowel and consonant appear in §4.1 and §4.2.


