[Simplified Spelling Society Newsletter Spring 1986/1 pp5-13 Later designated Journal 2]
[See J8 article and J21 review by Frank Knowles.]
[Part 2 of this long article is on another page.]

Information Theory and its Implications for Spelling Reform. Part 1.

Francis Knowles.

[Professor Knowles is Professor of Language in the Department of Modern Languages at Aston University, Birmingham and his interests lie particularly in the application of computers to translation and lexicography. This article arises from a talk he gave to the Society on 12 October 1985, and was transcribed and edited with his permission.]


1.1 Shannon & Communication Theory.
The idea of Information Theory was first outlined in print in 1948-49, when the eminent American mathematician Claude Shannon published the results of work he had been engaged on for several years; however he first called it Communication Theory. Shannon was in the service of the United States government in the Second World War, employed on traffic analysis and cryptological duties. Anyone who has had the remotest dealings with that type of work - and I have to plead slightly guilty myself - would see immediately why Information Theory and language are so intertwined, and how many useful things can be found out about how language works as a result of the methods used. The question Shannon was particularly concerned with was whether it is possible to measure the information that gets transferred from one person to another in daily intercourse. He decided that it was possible.

1.2 Levels of language.
Shannon was concerned with the lowest level of linguistic analysis, which we will call syntactic analysis. Following other American linguists he saw language as existing on three levels. Firstly, syntactics, which is just how symbols combine with each other, or, to use anthropomorphic terms, how symbols can be friendly or unfriendly with each other, how they muster, how they group, how they keep their distance from each other. The next level up would be the level of semantics which is the process of allocating meanings to the symbols. The highest level would be pragmatics, how the users of symbols actually handle them in practice. Other philosophers and linguists have taken things further by suggesting that there is in fact an aesthetic dimension, but we need not concern ourselves with that here.

1.3 Efficient Messages.
During his military service Shannon was dealing with communications, especially intelligence communications. He would be faced with a message, and had to ask how to make it efficient so as not to waste battery power or electric energy in transmitting it, and how to protect it from unauthorized eavesdropping. He therefore tried to devise rules to help people code linguistic messages efficiently. Obviously military and diplomatic communications are a special form of communication, but what Shannon did has a wider validity, which is the basis of the communication theory we shall be discussing.


2.1 Understanding Ambiguities.
Most linguistic signals use a set of conventions, and effective communication requires the receiver of the signals to know the conventions and be able to interpret them when there is possible ambiguity. One may for, instance encounter the symbol ! by the roadside, which we all recognize as an exclamation mark. But if one asks in general terms what an exclamation mark is, one would probably say that it is an orthographical sign used in text which can't be pronounced but indicates some rhetorical features of a statement. It would be typically used in, say, a letter from Mr A to Miss B, when he might be tempted to use one of these signs after the words 'I love you'. However the sign on the roadside was a symbol from the Highway Code to warn of potential danger. It is also a sign of considerable importance to mathematicians, not least to those who work in Information Theory: 3 factorial, i.e. 3 x 2 x 1, is 3! It is also a sign used in chess literature when the author wishes to indicate a particularly good move; and in like manner a question mark would suggest that the move was dubious. So conventions may entail problems, such as ambiguity. I need scarcely add that the English orthographical system is recondite with conventions whose ambiguity makes them less helpful than they might be.

2.2 Information Content.
Shannon aimed to go beyond these conventions and concentrate on making the written language as efficient a vehicle for communication as possible. He started by asking how he could measure it - a very difficult question. We can get a glimmering of the sort of approach he adopted if we say 'I want to kill two birds with one...', but leave the sentence incomplete. Every native speaker of English knows that the next word should be stone - and would be surprised if it was pebble or boulder or rock. Shannon concluded that if you knew the word was going to be stone, then it conveyed no information whatsoever being entirely predictable; its information-content was zero. But if we say, 'Jack Smith has gone and ...', and stop there, no one would be able to say what the next words were going to be. Here Shannon would say, at that moment the atmosphere was pregnant, and whatever followed would have been rich in information-content.

2.3 Statistical Knowledge of Language.
So how do people know that the next word was going to be stone? It is because we are inured to that fixed idiom of English which always occurs in that context. But a foreigner might not have that knowledge and would then be unable to predict the next word, because he wouldn't have been exposed to English in daily discourse from early childhood, and would therefore not have built up his mind, or brain, the necessary statistical database, so to speak, about how the English language is used. That, I think, is proof, if proof is needed, of the importance of the statistical undercurrent to language. There is a quantitative side to language which most people are very adept at keying into and handling, though of course they couldn't describe it in extenso. Grammar-books may attempt to do so, but I doubt very much whether even the most comprehensive grammar-book could tell us the complete, exhaustive rules for the use of the definite article in English. It will lay down a whole series of rules, occupying many pages, but at best they might cover only 85 or 90% of cases. The rest is covered by that feeling for language we only have from the way we have been brought up and linguistically sensitized.

2.4 Applications of Statistics of Language.
Shannon tried, and others have tried after him, to use this statistical aspect of language as a sort of crowbar to get inside the way messages are communicated, and so make telecommunications efficient. There are all sorts of ways in which this can be done. Shannon's work was in the field of cryptology and codes - the attempt to regularize and invent words to symmetrize information, so that they can be compacted into neat packets. This task was greatly facilitated of course by the advent of computers with what the computer scientists call fixed architectures. It sometimes makes the computer the boss, rather than human beings, which is always a pity, but no doubt that particular danger will recede.


3.1 Spoken and Written Codes
Let us look at some of these features that Shannon thought of as being measurable, such as codes, for instance. English exemplifies how codes are used rather well.

Writing and speech constitute different codes. If you tell someone how to use a public telephone box, you use the spoken code and probably start by saying, 'put a coin in the slot'; but printed instructions would be in the written code and probably say 'insert the coin in the slot'. If such a code of conventions is ignored, it creates a curious impression. But that is not a code of direct interest to us from the point of view of Information Theory, though it is of very great interest to students of language in a more general sense.

3.2 Reconstituting Garbled Messages.
To take a rather dated example, suppose a young man sent a telegram to his girlfriend saying 'COMING AT THREE LOVE BOB', and imagine the message actually received was something like 'COMIHG AQ THZEE LIVE ROB'. There might be confusion if there is both a Rob and a Bob in the young lady's life, but otherwise she should have no difficulty interpreting the message correctly. The original information was garbled - but communication was unaffected because the receiver of the message was able to reconstitute it using her knowledge of the statistical probabilities of language. Positive Information Theory has to do with information as sent by the transmitter and as received by the recipient. If anything goes wrong with the communication, one needs to know how the information got lost and whether it is in fact re-constitutable. Now suppose the message sent had the same spoken form, but was written 'COMING AT 3 LOVE BOB', and suppose that in the process of transmission it became garbled to 'COMIHG AQ 2 LIVE ROB'. With the symbol 3 changed to 2, there is no context in the telegram to help reconstitute the correct hour. So Bob arrives at 3, and his girl-friend is less than pleased. The point is that numbers have no contextuality, and if we are sending telegrams, we are well advised to spell out any numbers alphabetically to avoid such dangers of garbling.

3.3 Chinese/Japanese Telecommunication.
Let us now take a different context. Imagine a businessman sent to Japan to negotiate a business deal, with permission to spend say £100,000 for his company. When he gets to Tokyo, he finds out that the price has risen to £150,000, so he has to get permission to spend extra money. He could go to the Japanese Post Office and telegraph, 'Do I have permission to spend £150,000, Jack'. What he would actually do, if he were well-trained, is indeed to go to the Post Office and send a telegram, but it might read XPQRSZ 5683451, which is cheaper to send, and doesn't clutter up the telecommunications systems. It looks meaningless, but it isn't in fact, because the recipient will have a dictionary in which he can find that it means 'Do I have permission to spend' the sum in question. But this can only be used where this is the accepted code - it's no good just looking round the door and saying to someone XJQ £150,000. In other words there is a conventionality here too. In some countries the convention has to go much further than that. We all know about the vagaries of the Chinese language, where there is no alphabetic system, but a large number of stylized ideograms that prompt the memory about how the word should be pronounced but don't directly correlate with the pronunciation. Chinese school-leavers are supposed to know 5,000 characters, but a typewriter with 5,000 keys on it is inconceivable. However if you don't have a typewriter, you certainly can't have a teleprinter in China. To send telegrams in China, you have to use a code-convention which effectively means going to the Post Office and saying 'I want to send a telegram to someone with surname type 73, living in city 7, street something or other, and the message type is 425', which might well be 'Arriving Peking, time-type 67' - for 10 a.m. perhaps.


4.1 Redundancy.
All these forms are conventions. We have seen that when information is spoilt en route, we can sometimes reconstitute it. This leads to the notion, which Shannon initially formulated, that if this is possible, there must be some redundancy involved in the way the information was recorded in the first place. As a mathematician, he wanted to measure this redundancy. It is possible to say that written English is 50-60% redundant in this very basic aspect - quite how that is worked out we shall see a little later. Redundancy is one of the key concepts of Information Theory, and if Information Theory aims to do anything, it is to iron out that redundancy, and get rid of it where that is a sensible thing to do. It is not always a sensible thing to do, as we observed with the garbled telegrams. It has been shown that languages display a sort of constructive tension between redundancy and informativity, and if you remove redundancy, then presumably you are increasing the informativity. However there can be occasions when the information flow in writing and in speech, but particularly in speech, is just too rich, and then redundancy appears to be a necessary feature of language. All we are talking about is dispensing with it in certain circumstances, where that seems to be a sensible thing to do.

4.2 Excessive Information Content.
Here are some highly contrived examples of what is meant by information being potentially too rich. Let me read out the following sentences:

The audience who just heard the person who cited the king who said my kingdom for a horse is dead is an example is a psychologist are very patient indeed.

That sentence is virtually incomprehensible. Or:

The picture that the script that the novel that the producer whom she thanked discovered became made was applauded by the critics.

That too is almost impossible to follow. If you bracket suitable word-groups, however, you could work out that it meant:

She thanked the producer who discovered the novel that became the script that made the picture that was applauded by the critics.

That example reminds one of 'the house that Jack built', which goes to show that certain structures in our language which have evolved historically have not done so by accident, but because they proved capable of conveying information. Now Shannon would presumably say that such English sentences were wayward, but their information content is exactly the same however it is arranged. That is a bit of a conundrum for linguists.


5.1 Distribution.
If Shannon's aim was to measure total information-content in a message and leave it at that, further work has concentrated on measuring information as the text proceeds, either from left to right in written text or in time elapsed, in spoken messages. To give a silly example: supposing I happened to know that in a particular book that was 50,000 words long the word orthography occurred 1530 times, you would be surprised if the last 1530 words of the book were all orthography. So there is a distributional side that is exceedingly important: it is important at the level of syntactics, and at the level of word-fragments, such as graphemes, that people like ourselves are interested in.

5.2 Statistics and Cryptology.
Shannon realized that if so much depends on statistics, then the statistics must be established, so that we know whether or not we are on safe ground. This led to an amazing upsurge, both during the war and since, in the counting of linguistic elements, ranging from letters of the alphabet and phonemes upwards, and some very extensive results are available, relating to most of the world's languages. A lot of them are used of course for purposes such as cryptography, the making and breaking of secrecy systems. In fact the cryptographer turns the Information Theory coin upside down, because he wants to obliterate the tell-tale statistical characteristics of text, so that the unauthorized eavesdropper hasn't got a handle with which to interpret the coded message. Shannon operated in the cryptology mode for quite a long time, and I always draw students' attention to the fact that though they always seem to be aware of one of his papers in the Bell Systems Technical Journal, they don't seem to be aware of his other paper about cryptology.

5.3 Use of Computers.
Computers, which were first coming into prominence in the late 1940s, were ideal machines to calculate some of these statistical characteristics of language, and they have remained so ever since. There is still a counting-game going on and a very intricate analysis of many different languages - not only English. Other languages have been less thoroughly researched but there is still a vast quantity of data on file.


6.1 Shorthand.
Anyone interested in spelling reform must inevitably take an interest in related fields, such as shorthand. Shorthand aims to record speech in writing with a high degree of accuracy as fast as it is uttered. A typist will reconstitute the shorthand notes or outlines later, normally within 24 hours, into running text. This brings me back to emphasize an obvious point I made earlier, which is the importance of context: unless the secretary reconstitutes the text fairly soon after it was taken down, it may no longer be possible to reconstitute it completely accurately, since the outlines merely serve to prompt the memory of what was said.

6.2 Word Divisions.
The sample of shorthand and its transcription below shows words bound together with hyphens that orthographically should be separate.
sample of shorthand
When using PitmanScript the writer must always try to-write
continued sample of shorthand
neatly and-at-the same-time quickly, This may not seem very easy

That brings us to one of the major problems of any sort of linguistic analysis, including spelling reform, which is how to define the orthographic word. It is another area where the statistical behaviour of language is important. Consider the words firewood, fire-engine, fire insurance, which most people would probably spell respectively as one word, hyphenated and as two words. But there are very many expressions in English where there is no such consensus, and where different people will divide words in different ways. Then there are words like nevertheless, consisting visibly of elements that have the status of words in their own right. That is a problem both for spelling systems and for Information Theory. The next sample shows some shorthand outlines from the field of politics and government which cover quite long chunks of material in which words are glued together.

shorthand samples for political terms

Words are of course normally separated from each other by a single space, but there is a kind of syntactic glue which crosses that space - between fire and insurance it is really quite strong, and has to be taken account of in Information Theory.

6.3 Syntactic Structures.

This question is bound up with the nature of any given language and which of three methods it uses for structuring itself on the syntactic level. One method is inflection, tampering with word-endings usually, but sometimes with word- beginnings, and occasionally with word-middles. Another method is function words, like prepositions and so on in English. The third method is word order. All languages have to use a combination of these three methods to create syntactic meaning. Inflection of course doesn't figure very largely in English, which is particularly awkward in this respect because English words, so to speak, wear civilian clothes, rather than military uniforms with badges on them which say 'I am a noun', 'I am a verb', 'I am an adverb'. They are very gregarious and you don't know what they are until you have seen the company they keep. This has considerable implications for Information Theory, because what people often do when they are putting language into a restricted communication system as Shannon did is to use what we commonly refer to as telegraphic style, dropping words like articles, and carrying out other manipulations. Thus newspaper headlines are totally different from the ordinary grammar of English. The orthographic word problem is with us everywhere, and we cannot avoid it.

6.4 Non-Phonographic Shorthands.
There are systems, some of them 2-300 years old, which try to contract running text to very short segments indeed that bring in a high level of conventionality. In Dutton Speedwords, for instance, the sentence 'there are hundreds of things to be done between now and the winter season' appears as 'e cen d om e fad in nu & l pea peg', which has no pronunciation equivalent indeed that is not its purpose; but it achieves a 62% reduction. That is however quite unlike the purpose of spelling reform, and indeed such systems do not seem to have been very successful over the last couple of centuries or so.


7.1 10% Letters Omitted.
Let us now look at some texts illustrating the effect of varying percentages of omitted letters. The following text has a random 10% of its letters cut out:


It is possible to read this text with little difficulty, but it will be noticed that reading would be easier if the gaps within words had been closed. Cryptologists, after all, look especially for word endings when deciphering text, but when composing coded text they try to baffle the reader by concealing the word boundaries and bunching their messages up into groups of five characters. If you know where the word beginnings are, the character combination TH is so common in English that it often gives the game away.

7.2 25% Letters Omitted.
Now for a text in which a random 25% of the letters have been cut:


The full text reads:
Since the age of fifteen poetry has been my ruling passion and I have never intentionally undertaken any task or formed any relationship that has been inconsistent with poetic principles, which has sometimes won me the reputation of an eccentric. At the age of sixty-five I am still amused at the paradox of poetry's obstinate continuance into the present phase of civilization.

It is characteristic of the reading-process that on now looking back at the reduced text, patterns become clear that were not so before. One letter in four was missing, but with some persistence it was possible to read the text. If two letters out of five go, then the task becomes as arduous as a crossword puzzle - it might take an hour to solve. In fact crosswords are interesting from the point of view of Information Theory because they contain a lot of contextuality, and they wouldn't be possible if language didn't have this inbuilt redundancy.

7.3 Systematic Omission.
A question that now arises is whether it makes any difference if letters are systematically rather than randomly omitted. We find or instance that if the word beginnings get obliterated, reading becomes very much harder, but it makes much less difference if the letters omitted are not pronounced anyway, or if only vowel-letters are omitted. Consider how little difficulty the large number of omissions in the following sentences make to our comprehension:

1. Tk the bk to the grl in the clsrm.
2. I shl go hm at fv.
3. Pt the bx on the tp shlf in the rm.
4. I wnt into the rm to se the grl.
5. I tk the bs to the vlg to mt my fthr.

But there is a paradox here regarding spoken language, because just as you can delete vowels (but not consonants) from a written text without rendering it incomprehensible, so you can remove consonants (but not vowels) from recorded spoken language, and it doesn't have too serious an effect. Technically the reason is that vowels have three phases, an onset, a hold and an offset phase, and there are enough acoustic clues in the onsets and offsets to give an almost cast-iron guarantee as to what the adjacent consonants would have been.

7.4 Hebrew.
This brings us to Hebrew. Written Hebrew has some similarity to that form of English used in the five sentences above. The consonants remain, indeed a consonantal skeleton has to remain, but they are insufficient to reconstitute the pronunciation. To do that, certain other symbols have to be added. A speaker of Hebrew would be able to read off the sentence
sample of Hebrew
which transliterated gives (reading from left to right):
shbyn hshlTvn hmrkzy
lshlTvnvt hmqvmyym
ysh bh nygvd nvsp
while phonemically it produces (reading from left to right with literal translation beneath each line):

ʃebeyn haʃiltonhamerkazi  
which-betweenthe-authority the-central 
laʃiltonot hamekomiin 
yeʃ banigudnosaf/
there-isin-itcontrast additional.

This sentence, 'the relationship between the central and local authorities constitutes another contradiction', is actually a quotation from Mao Tse-tung. The operation of reading in Hebrew is rather different from that in English. An English text leads the reader to make a phonetic projection in the mind which yields a semantic interpretation; but in Hebrew this is not possible. The text provides only the consonantal skeleton, and the reader has to supply the vowels; but there are many ambiguities, and reading straight off is not an easy exercise. A semantic interpretation first has to be struggled for, which then contextualizes to give a unique phonetic representation. There is an orthographical reform lobby for modern Hebrew in Israel, which would like to make things easier for the reader by inserting characters which will give a more secure prompt as to which word is really intended. It should be remembered though that while the above sentence is the normal way of writing Hebrew, sacred texts and sometimes poetry have special status and can be written in what is called vocalized form, which includes extra dots above and below the letters, called the Nikud system. There is a Hebrew word which means a teacher, and is pronounced /moreh/; in Hebrew orthography Hebrew word for 'teacher' but the written form can also mean a female teacher, and then it is pronounced /mora/. There is no difference in the spelling. The syntax or context may give a clue, but if you wanted to write down 'I like my teacher' in Hebrew, a reader would not know how to pronounce it without knowing whether a male or female teacher was meant. One way round this is to add extra information as to the sex at the end of the sentence. But the Hebrew spelling system is an excellent example of how redundancy can be dispensed with, even though it antedates Shannon's formulation of redundancy by a couple of thousand years.

7.5 Arabic & Abbreviations.
Arabic, being also a semitic language, is rather similar. If you ask an Arab to read this word (right to left) Arab word with several meanings he will be unable to. It could be /ta'lamu/, meaning you know, or /ta'allama/ (he studied), or /ta'allam/ (study!, imperative), or /ta'allumun/ (study, instruction ), or /tu'allimu/ (you are teaching), or /tu'limu/ (you are announcing, informing). It is only when you have the context that you can resolve the ambiguity of these abbreviated forms. There is nothing unusual about abbreviations, of course; we use them all the time. Why bother saying Union of Soviet Socialist Republics when you could say just USSR. In small ads, in an attempt to reduce the cost of the ad, VGC can stand for 'very good condition'. But all abbreviations are conventional, and to understand them you have to know what the conventions are.

Back to the top.
Forward to Part 2.