[Spelling Progress Bulletin, Fall 1982 pp15-20]
[See J16 Palantype]

SSS Conference 3: Spelling for Electronic Communication.

"Computer transliteration of shorthand,"

by Colin P. Brooks.

*Dept. of Electronics, Univ. of Southampton, U.K.

Summary.

This paper reviews the development of two specialised text processing techniques for computer transliteration of shorthand. It is concerned with the problem of trying to reconstruct automatically an ideal orthographic transcript from imperfect, phonetically-based, shorthand notes. This work forms an integral part of a project to allow a simultaneous transcript of almost verbatim speech to be presented to a post-lingually deaf audience.

Corpus.

Introduction.


The Family Welfare Association estimates that between 0.8 and 1.1 million Britons suffer a hearing loss sufficiently acute to be regarded as a social handicap. [1] Whilst many of these people manage remarkably well with a conventional hearing aid, there are still a considerable number for whom attending a public meeting, or watching a television programme, is either difficult or impossible. However, since a high percentage of these people are post-lingually deaf, having become deaf in later life, one way in which it is possible to help is to provide them with a simultaneous written transcript;of speech, such us subtitles on television. At the Dept. of Electronics, Southampton Univ., we have been investigating the problems of providing the deaf with a simultaneous written transcript: of speech for a number of years.

Unfortunately, there is a fundamental problem with speech transcription concerning the maximum speed at which it is possible to enter text into a machine. Speech varies greatly in speed, but normal conversation usually lies somewhere between 120 and 220 words per minute (wpm). A good typist, on the other hand, can normally only manage between 60 and 80 wpm, and may even have difficulty sustaining this speed over a prolonged period. Modern word processors considerably reduce typing effort, put they do not significantly improve on these figures for text input. The fact remains that it's usually not possible to type fast enough on a conventional QWERTY keyboard to keep up with verbatim speech.

Neither does automatic speech recognition provide a solution. Although theoretically attractive, simultaneous recognition of unconstrained speech is, as yet, impossible and is likely to remain so for some time. [2] [3]

Fortunately, there is an alternative. Numerous shorthand notations have been devised and used over the centuries to allow the verbatim recording of speeches, debates, and court proceedings. However; it would be of very little use simply to present a deaf person with a simultaneous shorthand transcript of a TV programme, for example. Ignoring any technical problems which might arise, most deaf people would be either unwilling or unable to learn what is, after all, a complex code based on a mixture of phonetic and graphemic principles. Instead, the deaf person requires a readable transcript presented in a reasonably familiar manner. In order to be simultaneous, such a transcript needs to be produced automatically.

Figure 1.

A Palantype shorthand machine.
Figure 1. A Palantype shorthand machine.

2. Automatic transcription of shorthand.


During the course of our research, we have built a number of prototype shorthand transcription systems, each comprising an electric Palantype keyboard, microprocessor-based transcription unit and television monitor screen. [4] These systems allow a Palantypist to provide a simultaneous written transcript of speech on a television screen for deaf people to read. Two of these systems are in daily use by deaf businessmen, who find them of great benefit. [5]

Palantype machine shorthand is particularly suitable for this purpose because, being keyboard-based, it interfaces very conveniently to a computer. However, we have maintained our interest in other shorthand systems and, as part of a wider investigation into man-machine systems, we are currently exploring the feasibility of computer transcription of Pitman's handwritten shorthand. The translation techniques described in this paper were developed as part of this research, although the findings may be applied (at least in principle) to either shorthand notation. However, before going on to discuss the development of the translation processes in detail, let us review the essential principles of both of these shorthand systems.

Palantype machine shorthand [6] is a phonetically-based system in which groups of keys, representing a complete syllable, are pressed simultaneously to form a "chord." Words are represented by a number of chords, usually dependent on the number of stressed syllables within the word. The Palantype machine itself has a keyboard of 29 keys in a rather unusual layout, symmetrical about the centre (See figure 1). The keyboard divides naturally into three groups: a left hand group of 12 keys representing the initial syllabic consonants, a central group of 5 keys representing medial vowels, and a right hand group of 12 keys representing final syllabic consonants. As there are only 29 keys in total, a certain amount of coding is required in order to represent a sufficient number of phonemes. The output from the machine is in the form of a paper band on which each chord is printed on a separate line. Unlike an ordinary typewriter, the paper only moves vertically, and each key always causes an imprint in the same position horizontally across the line.

Figure 2 illustrates an example of Palantype output with its English equivalent. (The layout of the printed characters across the Palantype roll is shown at the top of the figure. Normally, the band would be about 6cm wide.)

Figure 2.

SCPTH+MFRNLYOEAUI.NLCMFRPT+SH
Figure 2. An example of Palantype output with its English equivalent.
Palantype machine shorthand is a very fast shorthand system allowing accurate outlines even at verbatim speeds. In common with all machine shorthand systems though, word boundaries are not explicitly marked and this complicates transcription by computer. Despite this Palantype shorthand is highly suited to this application and suffers from only one major disadvantage. Unfortunately, unlike handwritten shorthand, it is a comparatively rare skill and there may be as few as 100 practicising Palantypists left in the United Kingdom.

In contrast, Pitman's shorthand [7] [8] is undoubtedly one of the most commonly used shorthand notations in the world. It is another phonetically-based system in which the consonant 'kernel' of a word is represented by a sequence of simple, single-stroke, geometric shapes, such as straight lines or shallow curves. The vowel sounds, added only if time permits, are represented by dots of dashes written alongside the consonant symbols. In addition to the basic range of 40 phonemes, there are special symbols for orthographic features such as common prefixes, suffixes, and consonant digraphs, etc. In common with Palantype, very common words and phrases are represented by highly abbreviated symbols called short forms, and these tend to reduce the phonetic quality of the script, particularly at speed. Figure 3 illustrates a short sample of good quality Pitman's 2000 shorthand.

In comparison with machine shorthand, handwritten outlines are highly abbreviated and lack many of the finer details of the original words. Pitman's shorthand in particular emphasises the importance of the consonant kernel, often at the expense of disregarding vowels and unstressed syllables altogether. However, despite the comparatively poor quality of written shorthand notes, they do have a subjective advantage over their machine counterpart. On the whole, word boundaries are preserved, and this has a number of advantages in subsequent automatic processing and in readability of the final transcript. There is one other difference between these two systems. Although Pitman's shorthand allows verbatim transcription, provisional studies indicate that a speed of about 120 wpm will probably represent the upper limit for transcription by machine. Beyond that speed, outlines become too highly mutilated (both physically and linguistically) to be transcribed automatically.

Figure 3.

Figure 3. A short sample of Pitman's 2000 shorthand

3. Text processing objectives.


One possibility originally considered was that of displaying the (recognised) shorthand outlines directly in some form of phonetic alphabet. This, of course, would alleviate the need for any sophisticated linguistic processing. A number of possible alphabets were considered, including the International Phonetic Alphabet (IPA) and the Initial Teaching Alphabet (ITA). As might be expected, ITA would suit Pitman's shorthand quite well, but, despite this, use of a conventional alphabet was selected because of overwhelming advantages. It is not impossible to electronically display an alphabet such as ITA, but it would require non-standard equipment which is considerably more expensive. Furthermore, since most existing public data transmission services (such as Teletext and Prestel) only allow display of a standard alphabet and rudimentary graphics, use of any different alphabet would prevent compatibility with these. More importantly though, there appears to be very little, if anything, to be gained from the deaf reader's point of view in deliberatly departing from a standard alphabet and traditional orthography.

The objectives of the linguistic processing system were thus established as follows:

a. To produce an ideally orthographic target script from the pseudo-phonetic source script,

b. To suppress, or at the very least, tolerate mutilations in the source script,

c. To produce a target script which can, within reason, be traced back to the original sources phonemes in the event of error,

d. To be computationally "cheap."


Two different methods of achieving these objectives were originally considered: these were' translation by dictionary lookup & transliteration by rule. (Transliteration is the name given to the process of conversion from a source script written in one alphabet to a target script written in another.) Each technique was found to have a number of advantages and disadvantages. Generally speaking, a dictionary based system has the advantage of a very high performance, but at the cost of being intolerant of error and computationally quite "expensive" to implement. Transliteration by rule, on the other hand, offers a lower performance, but one that is considerably "cheaper" to implement and more tolerant of error. (There is no danger of a mutilated outline being transformed into something entirely different by an erroneous dictionary match.)

In the current generation of transcription computer, we have chosen a compromise solution. A small dictionary is incorporated to deal with the most common words (which are usually short forms), but all other words are processed by spelling 'rule.' It is interesting that this is an approach also adopted by a number of spelling reformers in their proposed reforms.

4. Transliteration by rule.


To reiterate then, the transliteration procedure must govern the conversion of the pseudo-phonetic source script (the shorthand notation) into an ideally orthographic target script. The spelling 'rules' mentioned above must therefore reflect how best to represent a phoneme graphemically in any given situation, taking into account numerous factors such as the position of the phoneme and the conventions of the particular shorthand notation. The rules currently being developed also take into account the relative frequency of every possible graphemic representation of each phoneme. For simplicity, only the most common of the possible range of graphemes for each phonemes are considered. Each phoneme is assigned a specific set of transliteration rules; each member of that set relates to the transliteration of that phoneme in a particular contest. Phonetic context was chosen as the best means of distinguishing possible graphemic outcomes because of the inherent simplicity in comparison with other possible techniques and because studies indicate" that this may be one of the most important factors influencing phoneme-grapheme relationships.

By way of example, consider development of the set of rules relating to transliteration of the long /A/ vowel. Table I lists the range of possible graphemes for this phoneme, which number 16 in all. (This table also serves to illustrate another complication caused by working from an imperfect phonetic code such as shorthand. In practice, the Pitman /A/ vowel is actually used to represent two distinct phonemes corresponding to the vowel sounds of "hay" and "hair".) Many of these graphemic options occur comparatively infrequently (i.e. less than 2%) and so may he disregarded without significant loss. This leaves four possible graphemes, namely <a>, <a. .e >, < ai > and <ay>. A survey was performed of the most common words in English to determine in what circumstances /A/ would be spelt <a>, and when it would be spelt <a. .e>, etc. Words belonging to each category were grouped and any suitable spelling pattern isolated manually. An automatic technique for detecting spelling patterns would have been preferable, but this was not possible in the time available. However, the use of rhyming dictionaries and reference to the work of several spelling reformers aided the collection of an adequate number of examples of each grapheme [10] [11] [12] [13] Reference to a rank list of the most common words in English was also found particularly useful in this respect. [14] The resulting spelling patterns for the /A/ phoneme are listed in table 2. A similar set of rules have been isolated for every other phoneme in the Pitman shorthand alphabet.

The transliteration rules for the /A/ phoneme would be read as follows. Consider the first rule governing the grapheme < ay >. This rule would be read:

"If the /A/ phoneme is preceded by any phonetic consonant AND followed by a word boundary, THEN the /A/ phoneme is probably best represented by the grapheme <ay>"

This rule would thus be satisfied by the words "pay", "may" and "say." Similarly, the second rule would read: "If the /A/ phoneme is followed by the phonetic consonant /n/, which in turn is followed by any inflection or word boundary, THEN the /A/ phoneme is probably best represented by the grapheme <ai>."

This rule would thus be satisfied by the words "pain," "rain" and "training," for example. The other rules in this table would be read in an exactly analogous fashion. In the event that no specific context rule was satisfied, then the grapheme shown on the bottom line (label led context "else") would be output. Naturally, this should normally be the most common graphemic representation of the phoneme, and in this case is just <a>.

In addition to the groups of rules relating to the transliteration of specific phonemes, the overall process must also be sensitive to a number of the more 'general' rules of English spelling. [11] For example, the following spelling conventions have also been incorporated:

TABLE 1.

This table illustrates the phoneme-grapheme correspondences expected for the Pitman long /A/ vowel (as used in "HAY" and "HAIR"). After Hanna et al., reference [9].

Grapheme Estimated
percent
Cumulative
percent
Examples
a
a..e
ai
ay
e..e
ea
ai..e
e
ei
eigh
ey
et
aigh
ei..e
au..e
ay..e
43.19
34.03
10.29
05.30
1.62
1.09
0.85
0.81
0.76
0.72
0.60
0.36
0.16
0.08
0.04
0.04
43.19
77.22
87.51
92.81
94.43
95.52
96.37
97.18
97.94
98.66
99.26
99.62
99.78
99.86
99.90
99.94
mAbel / mAry
lAtE / cArE
rAIn / fAIr
pAY
fEtE / thErE
EAch / tEAr
rAIsE / questionnAIrE
cafE / sombrEro
vEIn / thEIr
slEIGH
thEY / EYrie
bouquET
strAIGHt
sEInE
gAUgE
AYE

TABLE 2.

This table lists the transliteration rules for the Pitman long /A/ vowel.
Rule
Number
Phonetic contextGrapheme
1
2
3
4
6
(consonant), A,(word boundary)
A,N,(inflection or boundary)
A,D>2,(word boundary)
A,D<3,(word boundary)
A,(consonant; but NOT N or D),(NO vowel)
else
ay
ai
a..e
ai
a..e
a

(a) The addition of a silent 'e' following a final consonant preceded by a long vowel, such as occurs in "cake" "like" and "mute."

(b) The removal of a silent 'e' before the 'ing' and 'ed' inflections, such as occurs in "taped" and "taping." (c) The doubling of a final consonant following a short vowel and preceding the 'ing' and 'ed' inflections, as in "map", "mapping" and "mapped."

The overall translation process thus operates in this manner. Working with a word at a time, the transcription computer sequentially examines each of the phonemes in the shorthand outline. For each of these phonemes, it attempts to match the context in the shorthand outline with one of the context rules listed in transliteration tables like the one in Table 2. If the context in the outline matches one of the listed context rules, then the grapheme recommended by that rule is used. If no specific rule is matched, then the phoneme is simply represented by the most common grapheme for that phoneme. Meanwhile, the computer also checks to determine whether any more general spelling rules are applicable, and if so, takes the necessary action. This process continues sequentially for each of the phonemes within the source text until the whole outline has been transliterated.

At present, over 100 context rules have been isolated for the Pitman notation. Work is currently in progress to develop a similar system for Palantype shorthand, but in this case, it is necessary to preceed the transliteration process with some means of accurately locating word boundaries. Many of the transliteration rules rely on word boundary information to determine the most appropriate grapheme for a given grapheme. The rules are still provisional, and much more work needs to be done to achieve the best compromise between overall complexity, accuracy and tolerance to error when working from real shorthand outlines.

5. Vowel marker insertion.


As already mentioned, an additional problem when working from a Pitman transcript is that the outlines are often vowel deficient; non-essential vowels are omitted in order to increase recording speed. Hence, it was also necessary to develop some means of automatic 'vowel insertion' in order to improve the readability of the final transcript.

The principle of operation of the vowel insertion scheme is quite simple. Every pair of adjacent consonants that have a low probability of occurence in everyday English are split, and a generalised vowel marker sign (currently a "+") inserted. The insertion of such a marker merely denotes that it is likely that the stenographer left out a vowel character in that position from the original outline. It is not possible to reliably insert a specific vowel, except perhaps in the case of final silent 'e'. Although experiments have not been done by the author, a number of related experiments by psychologists interested in reading" imply that this technique should improve readability by helping to restore the correct word 'shape.'

In order to achieve the best possible performance from this technique, the insertion process has been devised to reflect the different initial, medial and final vowel structures common in English words. Special attention is given to consonant digrams which occur near word boundaries, as psycho-linguists believe [16] that word boundaries play a particularly important role in reading. To this end, contextual sensitivity is achieved by having not one, but five vowel insertion lookup tables. Each table summarises exactly which consonant digrams are permitted and which must be split in a given situation. The tables are the result of analysis of the most common vowel structures in written English, and in the prototype system, are arranged to split all digrams with a probability of occurence of less than about 40% in everyday English. Figure 4 illustrates the effect of the vowel insertion process on a small passage of vowel deficient text.

Figure 4.

An example of the effect of the vowel insertion process on a highly mutilated passage of English, in which all vowels in words of two or more consonants were first deleted and then automatically re-inserted.

Original Text:
This is a short demonstration to illustrate the effect of the vowel insertion process on a short passage of highly mutilated English. As can be seen, although it is normally only possible to insert a generalised vowel marker symbol, this does appear to improve readability.

Mutilated version:
Ths is a shrt dmnstrtn to Ilstrt th ffct of th vwl nsrtn prcss on a shrt pssg of hghly mtld nglsh. As cn be sn, lthgh it is nrmlly nly pssbl to nsrt a gnrlsd vwl mrkr symbl, the ds ppr to mprv rdblty.

Vowel Inserted Version:
Th+s is a sh+t d+m+nstrt+n to +ll+strt th +ff+ct of th v+wl +ns+rt+n pr+c+ss on a sh+rt p+ss+ge of h+gh+ly m+tl+t+d +ngl+sh. As c+n be s+n, l+th+gh it is n+rm+lly +nly p+ss+ble to +ns+rt a g+n+rl+s+d v+wl m+rk+r symble, th+s d+s +pp+r to+mp+rve r+d+bl+ty.

Figure 5.

A. The result obtained by transliteration of a 'good' phonetic transcript of the text is shown in Figure 3. The words shown in brackets are short forms and would normally be processed by dictionary.

(This iz an) eksmple (ov) pitmns hnd ritn shrthnd. Work (iz) currently (in) progress (tue) determin whether computer transcription (ov this) skript (iz possble). Sow far (thu) main problm seams (tue be) relliable rcomnsion (of the) shorthand outlns.

B. The result obtained by transliteration and subsequent vowel insertion of the shorthand notes are shown in Figure 3 after simulated recognition. The words shown in brackets are short forms and would normally be processed by dictionary and therefore appear correctly spelt in this transcript.

(This is on) egs+mple (of) p+tm+ns h+nd r+t+n sh+rth+nd. Wrk (is) c+r+ntly (in) progrs (to) d+trm+n wh+ther computer tr+nskr+ption (of this) skr+pt (is possible). So far (the) main pr+bl+m s+ms (to be) reli+bee r+nd+sion (of the) sh+rth+nd outl+nes.

6. Performance of the transliteration procedures.


The performance of the two text processing techniques described in this paper are illustrated in Figure 5. The first paragraph, Figure 5a, was obtained by transliteration of a 'good' quality phonetic transcript of the passage shown in Figure 3. A stenographer was asked to write out this text as accurately as possible in the Pitman alphabet, as if writing full shorthand outlines. As can be seen, the resulting transcript is highly readable and compares quite well with the original, despite the fact that some vowels are still missing. A transcript approaching this quality should be possible from Palantype machine shorthand provided that word boundaries can be accurately determined by some other means.

The second paragraph, Figure 5b, was obtained by transliteration and subsequent vowel insertion of the shorthand notes written by the same stenographer, also shown in Figure 3. In order to remove any possible effects of machine error, the shorthand outlines were recognised manually. This transcript is distinctly more difficult to read, but by no means impossible. However, a number of causes of error are clearly evident. Possibly the most serious of these errors is that caused by excessive abbreviation or syncopation of the outline as in the case of "rension" for "recognition." Here, the effect of the error is emphasised because "rension" is seemingly a reasonable word. There is little that can be done about this category of error except to encourage the stenographer to be as accurate as possible; accurate transcription of the beginning of a word is particularly important. Another problem evident in Figure 5b occurs when a transliteration error also induces a vowel insertion error. In particular, vowel insertion errors about a word boundary (as in "+sk+ript" for "script") can cause serious difficulty. In the future, a single text processing technique implementing both transliteration and vowel insertion on a phonetic level may help to reduce this type of error.

Conclusion.

This paper has discussed the development of two specialised text processing techniques for computer transliteration of shorthand notes. The problems encountered during this research were found to be similar to those experienced by spelling reformers searching for a logical spelling strategy for written English. In this case, however, the task was complicated by the use of an imperfect phonetic script such as shorthand. Although it was not possible to satisfy all of the original objectives, it was possible to devise a transliteration scheme which produces a readable, if not orthographic, transcript of the original shorthand. Further research is expected to improve the performance of these techniques but will never enable traditional orthography to be produced completely automatically. However, practical experience shows that, at least in some applications, an 'imperfectly' spelt transcript can be quite acceptable.

Acknowledgements.

I am grateful to Mrs. Tina Hearn for her interest in this project and for her willingness to test the transcription system.

References:

1. "Guide to Social Services, 1976", Family Welfare Association, Macdonald and Evans, London, 1976.

2. Newell, A.F., "Can speech recognition machines help the deaf?", The Teacher of the Deaf, v. 72, no. 428, Nov. 1974, pp. 367-374.

3. Underwood, M.J., "Machines that understand speech," Radio and Electronic Engineer, v. 47, nos. 8-9, Aug.-Sept. 1977, pp. 368-376.

4. Downton, A.C., "Speech transcription as a communication aid for the deaf," Child: care, health and development, v. 5, 1979, pp. 41-48.

5. Hayward, G., "Palantype in the office," Hearing, May-June, 1978, pp. 104-108.

6. The Palantype Manual, obtainable from the Palantype Organization, London.

7. Pitman Shorthand, new course, New Era Edition, Pitman Publishing Co.

8. Pitman 2000 Shorthand, First Course, Pitman Pub. 1975.

9. "Phoneme-grapheme correspondences as cues to spelling improvement," Hanna, P.R., Hanna, J.S., Hodges, R.E., Rudorf Jnr, E.H., Washington, D.C., U.S. Gov't. Printing Office, 1966.

10. Horn, R., "Phonetics and Spelling," Elementary School Journal, May, 1957, pp. 424-432.

11. Vallins, G.H., Spelling, Andre Deutsch, 1954.

12. Wijk, A., Regularized English, Almqvist and Wiksell, Stockholm, 1959.

13. Walker, J., Rhyming Dictionary of the English Language - in which the whole language is arranged according to its terminations, Geo. Rutledge & Sons, London, 1890.

14. Kucera, H., Francis, W.N., Computational Analysis of Present Day American English, Brown Univ. Press, 1967.

15. Pillsbury, W.B., "A study in apperception," American Journal of Psychology, v. 8, no. 3, Apr. 1897.

16. Brunet, J.S., O'Dowd, D., "A note on the informativeness of parts of words," Language and Speech, v. l, 1958, pp. 98-101.

17. Thorndike, E. L., Teacher's Word Book of 20,000 Words. New York: Columbia Teachers' College, 1931-2 187 pp.


Back to the top.