The Transparency of Spanish Orthography.

Ian Mackenzie.

Ian Mackenzie studied Spanish and French at the University of Oxford, then moving to Cambridge to embark on research in the field of semantics. In October 1992 he took up a lectureship in Spanish at the University of Newcastle-upon-Tyne, and now contributes this survey of Spanish spelling to the Society's Journal, where the lack of any account of Spanish has long been felt as a grievous omission. He hopes eventually to follow up this introductory article with a study of some of the practicalities of Spanish spelling, such as its pedagogical effects and the mechanisms for reform today.


Unlike its English or French counterparts, the Spanish orthographical system is designed to express sound-symbol relations as transparently as possible. Indeed, most institutional authorities on the Spanish language treat orthography as a sub-branch of descriptive phonology. For example, in its Esbozo de una Nueva Gramática de la Lengua Española, the Real Academia Española includes a chapter on orthography in the section dealing with phonology. The paragraph introducing that chapter contains the revealing statement:
In this [chapter]... we look ... at the system established by distinctive [phonemic] oppositions, that is to say, we specify a single sign for each phoneme. (op cit: 120)
The highly phonemic character of the Spanish spelling system will be illustrated below, followed by a sketch of how such a model phonographic system has developed.

1. The orthographic system.

The Spanish alphabet is conventionally viewed as containing the following thirty letters:
a, b, c, ch, d, e, f, g, h, i, j, k, l, ll, m, n, ñ, o, p, q, r, rr, s, t, u, v, w, x, y, z.
This means that in the dictionary words beginning for instance with ch follow those beginning with c, thus chacal comes after cuyo. However, the inventory of units central to the system is as follows (brackets indicate units peripheral to the system and having low frequency):
a, b, c, ch, d, e, f, g, gu, h, i, j, (k), l, ll, m, n, ñ, o, p, (q), qu, r, rr, s, t, u, ü, v, (w), x, y, z.
The basic system of sound-symbol correspondence for Spanish is shown in the following table. [1] [2] Irregularities are dealt with afterwards, in section 2.

Sound-symbol relations in modern Spanish.

Phoneme [3] Symbol(s)Distribution of symbols
and/or exemplification
/i/ [4] i (i) syllable-internal position:
fino [fíno] ('fine')
(ii) post-consonantal position:
sierra [sjéra] ('mountains')
(iii) word-internal syllable-final position:
raigón [raiɣón] [6] ('thick root')
(iv) stressed post-vocalic word-final position:
fui [fwí] ('(I) went')
 y [5] (i) unstressed post-vocalic word-final position:
rey [réi] ('king')
/e/e fue [fwé] ('(he)went')
/a/apatata [patáta] ('potato')
/o/ohola [ó la] ('hello')

all contexts except between g and e or i:
fuego [fwéɣo] ('fire'), lucha [lútʃa] ('struggle');
between g and e or i:
lingüista [liŋgwísta] ('linguist')
/p/ppeseta [peséta] ('peseta')
distribution not rule-governed
haber [aßér] [7] ('to have')
ave [áße] ('bird')
/t/ttilde [tílde] ('accent')
/d/ddoy [dói] ('(I) give')

all contexts except before e or i :
capa [kápa] ('cape');
before e and i :
queda [kéða] ('(he) remains')

all contexts except before e or i :
gato [gáto] ('cat');
before e and i :
guerrilla [geríʎa] ('guerrilla warfare')
/f/f [fé] ('faith')
/θ/ [8]z

all contexts except before e or i :
zapato [θapáto] ('shoe');
before e and i :
céldula [θéldula] ('cell')
/s/ssierra [sjéra] ('mountains')
/ʝ/ (palatal fricative)ypayaso [paʝáso] ('clown')

all contexts except before e or i :
paja [páxa] ('straw');
before e and i :
gitano [xitáno] ('gipsy')
/tʃ/ch macho [matʃo] ('male')
/ɾ/r pero [péɾo] ('but')
/r/ [9] rr  perro [péro] ('dog')
/l/llío [lío] ('mess')
/ʎ/ll llama [ʎáma] ('flame')
/m/mmamá [mamá] ('mummy')
/n/npan [pán] ('bread')
/ɲ/ñ mañana [maɲána] ('tomorrow')
Double consonant  
/ks/xléxico [léksiko] ('lexicon')

Unless they carry a written accent, i and u always stand for a glide (ie one of the semi-consonantal/semi-vocalic allophones of the relevant phonemes) when they appear before or after any of the three other vowels. Compare, for example, monosyllabic ley [léi] ('law') with disyllabic leí [leí] ('(I) read'), and monosyllabic pues [pwes] ('then') with disyllabic púa [púa] ('sharp point'). Only the first member of each pair may be said to contain a diphthong.

Stress rules.
1. If a word ends in a consonant other than n or s, it is stressed on the final syllable unless it carries a written accent indicating stress elsewhere. Words ending in a vowel or n or s and which do not carry a written accent are stressed on the penultimate syllable. Examples are: hablar [aßlár] ('to speak'), cama [káma] ('bed'), hablan [áßlan] ('(they) speak') and hablas [áßlas] ('(you) speak').

2. A written accent must be used to indicate the stress on those words which are not covered by (1): fútbol [fútbol] ('football').

3. Written accents are used to distinguish a small number of otherwise homographic homophones, such as ('I know') and se (third-person reflexive pronoun).

2. Irregularities.

A small number of Spanish spellings are not predictable from the above table and stress rules. These are accounted for by the following observations:

1. Before e and i /x/ and /θ/are sometimes represented by j and z respectively. Examples are pasajero ('passenger') and zeta ('zed').

2. h is always silent.

3. In word-internal syllable-final position the /m/ ~ /n/ ~ /ɲ/ opposition is neutralized, yielding the archiphoneme /N/. Syllable-final nasals are assimilated to the following consonant, as in enfermo [emférmo] ('ill'), conllevar [koɲɟʝeßár] [10] ('to involve'), and diente [djénte] ('tooth'). The representation of such syllable-final nasals is consistent with neither the phonetics nor the phonemics of the situation. Thus, while the archiphoneme /N/ may be represented by either n or m, neither graph consistently stands for the same phonetic segment, as revealed by a comparison of diente [djénte] and enfermo [emférmo].

4. w is used in a few recent loan-words, to represent either /b/, as in wáter [báter] ('lavatory'), or /u/ as in windsurf.

5. k, representing /k/, is employed in a small number of loan-words deriving from a variety of sources. Examples of words containing k are parking, káiser, kermes, kilómetro, all of which reflect borrowing of one kind or another.

6. Use of q to represent /k/ (ie with qu pronounced as in English quick) is also limited to loan-words, such as quantum [kwántum] and Latin American quáker ('porridge').

7. A small number of learned words, such as (m)nemotécnia [nemotéknja], may be written with a redundant initial letter. A few rather more common words, such as (p)sicología [sikoloxía] may also be written with a silent initial letter. In the Nuevas Normas of 1952, the Academy proposed the dropping of initial silent g, m and p, and users of Spanish may now choose between etymologically-oriented and phonemic spellings for these words. However, full assimilation into the orthographical system has not proceeded at a uniform pace, and some of these words still retain their redundant initial letter in general written usage. A comparable situation exists for se(p)tiembre and isé(p)timo, which are generally written with a p, despite the fact that these words are generaly pronounced [setjémbɾe] and [sétimo] respectively. [11]

3. Discussion.

3.1. Use of digraphs.
Schoolchildren and foreign learners of Spanish are generally taught that ch and ll are 'letters of the alphabet' representing /tʃ/ and /ʎ/ respectively. Consequently these digraphs are easily assimilated as indivisible units with no internal structure. Since qu almost always stands for /k/ rather than /ku/ (ie when /ku/ is realized as [kw]), this digraph is also easily assimilated as an indivisible unit. The grapheme rr, too, is treated as a single unit, standing for the alveolar trill - as opposed to the alveolar flap, represented by r. The grapheme gu is less consistent in its phonemic value, since it has two possible pronunciations. These are /gu/ (realized as [gw] or [ɣw] before a or o - eg guardia [gwáɾðja] and agua [áɣwa] - and as [gu] or [ɣu] before a consonant - eg Guzmán [guθmán] [12] and lúgubre [lúɣußɾe]) and /g/ (realized as [g] or [ɣ] before e and i - eg guerrilla [geríʎa] and erguir [eɾɣíɾ]). The unit gu cannot, therefore, be learnt as an indivisible unit. Other double consonants, such as cc and nn represent two phonetic segments. Consider, for example, acción [akθjón] and innegable [inneɣáßle].

The foregoing remarks indicate that apart from gu, those digraphs which do not correspond to two distinct phonetic segments are virtually always susceptible to one and only one interpretation, and thus do not pose any serious problems for either the reader or the writer of Spanish. Mapping gu onto the relevant phonemes - and the converse process - requires knowledge of a single orthographical rule. This is not an ideal state of affairs, but at least the mapping process is entirely rule-governed.

3.2. Breakdowns in one-to-one correspondence between sounds and symbols
The occurrence of h and the b ~ v alternation are, from the synchronic point of view, completely arbitrary. Users of Spanish simply have to memorize which words are spelt with h and which are not, which words represent /b/ (which, as is pointed out above, has the allophones [b] and ([ß]) as b and which as v.

A further weakness [13] in the system originates in the principle that before e and i the symbols g and c have the phonemic values of j and z respectively. This situation not only represents in itself a reduction in the transparency of the system, but the structural readjustments it triggers further obscure the relationship between the spoken and written codes. For in order to represent /g/ or /k/ before e or i, the digraphs gu and qu have to be brought into play. Use of gu as a contextual variant of g in turn requires use of a dieresis in order to provide a spelling for /gue/ or /gui/, as in pingüe ('greasy') and pingüino ('penguin'). The situation is further complicated by the facts that (i) j,rather than g, occurs before e and i in quite a large number of words (for example, dije [díxe] ('(I) said')) and (ii) z, rather than c, occurs in the same environments in a smaller number of words - for example, enzima [enθíma] ('enzyme').

The only other major weakness lies in the use of both y and i to represent /i/. Here again, the distribution of each symbol is rule-governed, although the use of two symbols to represent one phoneme further removes the spelling system from an ideal one-to-one correspondence with the sound system.

Less serious weaknesses are the mapping of the two phonemes /k/ and /s/ when they occur as a sequence onto the single symbol x, and the irregularities arising from neutralization of the nasals in certain positions.

Points 4 to 7 from Section (2) cover peripheral irregularities, which are of no consequence for the vast majority of Spanish words.

3.3. Heterographic homophones.
It follows from the above remarks that there are very few heterographic homophones in Spanish. The causes of heterographic homophony are (i) the silence of h (eg echo ('(I) pour') and hecho ('fact')), (ii) the existence of two symbols for /b/ (eg valón ('Walloon') and balón ('ball')) and (iii) the existence of two symbols for /θ/ (eg encima ('on') and enzima ('enzyme')).

3.4. Basic regularity.
In contrast to English and French, then, the Spanish orthographic system is highly phonemic. When one-to-one sound-symbol correspondence breaks down the resultant alternations between graphs are normally rule-governed and thus predictable. A few spellings can be described as irregular, but such cases are rare and the overall picture is one of generalized regularity. It is worth noting at this point that many of those words of Latin origin which in English include double consonants (despite the fact that the corresponding phonetic segment is not geminate), have a single consonant in Spanish. Examples are acomodación, diferente and atención.

3.5. An example of written Spanish, with a phonetic transcription [14] of the text.
Y salió de Granada, donde dejó una escasa guarnición, el día 19 de abril. Cuando avistó Vélez, el sitio cristiano se había afirmado por tierra y mar. Acampó en el castillo de Ben Tomiz. Le urgía despachar el combate y regresar; en consecuencia, atacó sin dilación al enemigo. Por entre los viñedos, verdes todavía, clamorearon sus gritos de guerra.

[i saljó ðe ɣɾanáða │ donde ðexó una eskása ɣwaɾniθjón │ el día ðjeθinwéße ðe aßɾíl ││ kwando aßistó ßeléθ │ el sítjo kɾistjáno se abía afiɾmáðo poɾ tjéra i máɾ ││ akampó en el kastíʎo ðe ßen tomí ││ le uɾxía ðespatʃáɾ el kombáte i reðɾesáɾ ││ en konsekwénθja │ atakó sin dilaθjón al enemíɣo ││ poɾ entɾe loʂ ßiɲéðos │ béɾðes toðaßía │ klamoɾeáɾon suʂ ɣɾítoʂ ðe ɣéra]

(Antonio Gala, El manuscrito carmesí.)

4. Historical development of the system.

It was under the direction of Alfonso X (1252-84) that a written standard Spanish - based on the speech of the upper classes of Toledo - emerged. As there was as yet no opposition to the principle that letters should represent speech sounds, the medieval orthography generally reflected the way Spanish was spoken in official and cultured circles in Toledo in the latter half of the thirteenth century. However, as Entwistle (1962: 158) points out, adjustments to the system failed to keep pace with changes in spoken Spanish, particularly in the pronunciation of medieval x, j, g, ç, z, s and -ss-, and by the beginning of the eighteenth century the orthography was ready for reform by the Academy. A brief look at the history of x, j, g, ç, z, s and -ss- and the sounds they represented at different times should give an excellent idea of the main problems which the Academy has had to deal with and the extent to which this institution has sought to make Spanish orthography a phonemic system.

As Penny (1991: 86 ff) explains, in Old Spanish there were seven sibilant phonemes:

Old Spanish sibilants

dental affricate/ts//dz/
alveolar fricative/s//z/
prepalatal fricative /ʃ/ /ʒ/
prepalatal affricate /tʃ/ 

The following spellings were used in intervocalic position:

/ts/ - c (before e and i) and ç : decir ('to descend', now obsolete), caça (' hunt', modern equivalent: caza)

/dz/- z : hazer ('to do', mod equ: hacer)

/s/ - ss : passo ('step', mod equ: paso)

/z/ - s : casa ('house', mod equ: casa)

/ʃ/ - x : dixo ('(he) said', mod equ: dijo)

/ʒ/ - g (before e or i) and j :mugier ('woman', mod equ: mujer), fijo ('son', mod equ: hijo)

/tʃ/ - ch : fecho (ppt of hazer, mod equ: hecho).

The affricate /tʃ/, together with its graphical representation ch, passed unchanged into the modern language. The sub-system consisting of the remaining six phonemes underwent a series of changes which eventually yielded the three phonemes /θ/, /s/ and /x/ of modern Spanish. The first change, consisting in a weakening of the dental affricates to the dental fricatives /s/* and /z/*, had probably been accomplished by the end of the fifteenth century. In the sixteenth century the voiced series was eliminated (leaving /s/*, /s/ and /ʃ/ and by the middle of the seventeenth century, probably as a consequence of the fact that the places of articulation of /s/*, /s/ and /ʃ/ are very close to each other, the dental and prepalatal phonemes had moved forward (to an interdental articulation, /θ/) and backwards (to a velar articulation, /x/) respectively.
[* these letters have an unidentified diacritic beneath them.]

This phonological restructuring created a disparity between pronunciation and spelling, and prior to the reforms of the eighteenth and nineteenth century at least eight graphs - ie all those employed in old Spanish - were employed to represent the three phonemes descended from /ts/, /dz/, /s/, /z/, /ʃ/ and /ʒ/. However, as the Academy began to take official control of the language, orthography was brought once again into line with pronunciation. The 1741 and 1763 editions of the Academy's Ortografía abolished ç and ss in writing, and in the 1815 edition x was officially prohibited as a representation of /x/ (its value being fixed as /ks/). [15] With the disappearance of ss, s had come to be the only symbol for /s/, while /x/ and /θ/ were each left with two graphical representations, namely j and g and z and c. As a concession to tradition, these graphical alternations were not abolished, the Old Spanish distribution of the graphs (g and c occurring only before e and i) being maintained.

The other principal irregularities in the system can also be traced back to medieval orthographical practices. The b ~ v alternation, for example, is clearly a vestige of former times. By the end of the sixteenth century the /b/ ~ /ß/ opposition (represented graphically by the b ~ v contrast) which had existed in Old Spanish had disappeared and the letters b and v had come to have identical phonemic values. In the eighteenth century the use of these two letters was officially fixed, primarily on the model of the Old Spanish practice, with some concessions to Latin spelling, as in the case of the reflex of intervocalic -B-: eg Latin DEBET > Old Spanish deve > post-1800 Spanish debe.

The use of silent h records the previous existence of a laryngeal fricative /h/, which had disappeared from standard Spanish by the end of the sixteenth century.

The modern distribution of i and y was officially established in the 1815 edition of the Academy's Ortografía and to a certain extent represents a compromise between phonemic transparency and respect for tradition. For example, the phonemically unjustified retention of vocalic y in unstressed post-vocalic word-final positions (eg rey, ley) reflects the medieval practice of employing this graph in word-final position after any vowel except /i/ (see Penny (1988: 343) for statistics concerning the Alfonsine corpus).

It is also worth noting that the use of qu for /k/ before e and i results from the generalized loss (according to Penny (1991: 83-4), at a period prior to the development of written Spanish) of the glide [w] from reflexes of Latin QU-, as in the case of QUINDECIM, which became Spanish quince ('fifteen'). Before stressed /a/, however, the glide was not eliminated, and it was only in 1815 that the Academy replaced qua- by cua-. Thus modern cuando [kwándo] was written as quando until the last century.

An equally important date is 1803, when the fourth edition of the Academy's Diccionario was published. In this work use of the digraphs ch, ph and th to represent /k/, /f/ and /θ/ respectively, a practice hitherto limited to a relatively small number of learned words, was formally prohibited.

Concluding remarks.

It is clear that the phonemic transparency of modern Spanish orthography is the result of sustained observance of the principle that spelling should reflect pronunciation and not linguistic history. In the modern period at least, it has been the Academy which has played the most significant role in maintaining sound-symbol correspondence, although in making a number of concessions to tradition and etymology it has stopped short of implementing a one-to-one set of sound-symbol correspondences. [16] Its most significant reforming work was performed during the eighteenth and nineteenth centuries.


[1] The phonemic system described here is based on that given in Canellada & Madsen (1987). This system represents a rather formal standard pronunciation. Many Spanish speakers operate with rather different systems, one of the commonest departures from the standard being the neutralization of the /ʎ/ ~ /ʝ/ opposition. For such people, mayo ('May') and mallo ('mallet' ) would be pronounced identically as [máʝo].

[2] The phonetic symbols are those of the IPA.

[3] Vowels are listed in their order of appearance on the vowel quadrilateral starting from high front and moving in an anti-clockwise direction towards high back. Consonants are listed in the first instance by manner of articulation (plosives, fricatives, affricate, flap, trill, laterals, nasals) and then by place of articulation (from the lips backwards). The voiceless member of each voiced ~ voiceless pair is always given first.

[4] /i/ and /u/ are realized as glides when they are unstressed before or after /a/, /e/ or /o/. When either of /i/ and /u/ follows the other, the first vowel is realized as a glide. I have represented the on-glide allophones (ie the realization of /i/ or /u/ when they precede each other) by /j/ and /w/ respectively. The off-glide realizations are represented by the same symbols as those which stand for the full vocalic realizations of the two phonemes.

[5] The words y ('and' ), pronounced [i], and muy ('very' ), pronounced [mwí] represent rare exceptions to the distributional patterns given here.

[6] In Spanish voiced plosives (ie /b/, /d/ and /g/) are realized as fricatives (ie [ß], [ð] and [ɣ] respectively) in word-internal positions (unless they follow a nasal; /d/ is also plosive after a lateral). For example, the d of dar ('to give') and andar ('to walk') is pronounced [d], while the d of ayudar ('to help') is pronounced as [ð].

[7] [ß] is an allophone of /b/. Note that the distribution of b and v does NOT correlate with the distribution of the allophones of the phoneme /b/.

[8] In most Andalusian and Latin-American varieties of Spanish the /θ/ ~ /s/ opposition is neutralized. In some areas (eg the province of Cádiz) the neutralization is in favour of /θ/ (this phenomenon is known as ceceo), and in most others it is in favour of /s/ (this phenomenon is known as seseo).

[9] At the beginning of a word, in syllable-final position and after /l/ or /n/, there is neutralization of /r/ ~ /ɾ/. The resultant archiphoneme (ie the abstract unit representing both the phonemes whose opposition has been suspended) /R/ is always represented by r. Although this arrangement does not capture the phonetics of the situation - r will stand for [r] in word-initial position and after a nasal or lateral, but for [ɾ] in syllable-final position - it constitutes a consistent representation of the phonemics involved. Thus sonreír ('to smile') may be transcribed phonemically as /sonReíR/ and phonetically as [sonreír].

[10] [ɟʝ], a voiced palatal affricate, is an allophone of /ʝ/.

[11] Schoolchildren may be taught either the etymologically-oriented spellings or the more phonemic ones. In formal written Spanish, people generally opt for the etymologically-oriented spellings. For example, in a corpus drawn randomly from mid-1980s editions of the Spanish daily newspaper El País there were 35 occurrences of septiembre and none of setiembre.

[12] See note 14.

[13] By 'weakness' I mean a departure from the theoretical ideal of one-to-one sound-symbol correspondence.

[14] Note that in discourse, word-initial /b/, /d/ and /g/ are only realized as plosives if the word containing them is preceded by a pause (indicated by │ or ││), or if they are immediately preceded by a nasal or - in the case of /d/ only - a lateral. Word boundaries are not generally observed in extended speech - I could just as well have written the transcription without leaving a space between the representations of individual words. Note also that /s/ and /θ/ are voiced ([s]* and [θ]* respectively) when they occur syllable-finally before a voiced consonant.
[* these letters have an unidentified diacritic beneath them.]

[15] Modern Spanish México, Texas and Oaxaca, in which x stands for /x/, represent rare exceptions to the rule that x stands for /ks/.

[16] It is interesting to note that initially the Academy employed, in Lapesa' s words (1942: 208-9), 'un criterio conservador y latinista'.

