[Spelling Progress Bulletin, Winter 1980, pp17,18]
[David Moseley: see Journals.]

Patterns of Spelling Errors: Some Problems of Test Design,

by David Moseley*

*School of Educ. Univ. of Newcastle upon Tyne, Eng.
*Presented at the 1979 Conf. on Reading & Spelling, Nene College.

The majority of spelling tests in current use cover a wide age-range and yield an age-related or standardised score. With the exception of the visual memory and phonic spelling tests in Durrell's (1955) battery, they are not designed for diagnostic purposes. It is, however, open to teachers who wish to compare different aspects of spelling performance to use two or more norm-referenced tests for the purpose. For example, one can compare a pupil's ability to recognise correct spellings with his ability to produce them, using measures such as the Richmond Spelling Test (France and Fraser, 1975) and the Spar (Young, 1976). An alternative approach, but one of unknown reliability, is to use an informal scheme of classification of spelling errors produced in writing from dictation or in free writing. One such scheme was proposed by Peters (1974).

Spelling tests can be derived from three main sources: graded vocabulary lists, lists of words misspelt in free writing by pupils of different ages, and lists of words judged by teachers to be appropriate for different age-groups. The majority of tests in common use appear to be based on graded vocabulary lists, and are not deliberately weighted with 'spelling demons'. This reduces their content validity to a certain extent, since failure to spell common but graphically idiosyncratic words like 'through', 'friend', 'because', and 'people' is certainly what one expects from an incompetent speller. The study reported here concerned tests derived from lists drawn up by teachers for a particular age-group, and one of the issues discussed is the length of test required if one is looking for reliable diagnostic information for use in planning individual programmes of corrective or remedial work.

Little research has been carried out to compare different formats, of spelling test in terms of reliability and validity. The most common format is single word dictation, but multiple-choice formats and dictated passages are also used. Clarke (1975) obtained a correlation of 0.9 between his own dictation spelling test and Schonell's Spelling Test (1932), which suggests that there is little advantage in the use of dictated passages. Such passages, although meaningful, are time-consuming to administer and mark.

Practical constraints such as the ease of mastering a marking scheme, rapid group administration, and low cost have major influence on whether or not an assessment device is accepted by teachers. In this paper, guidelines are offered both for formal and informal assessment of spelling errors. The analysis of different types of spelling error is not intended to be exhaustive, but even a simple scoring scheme can sensitize teachers to the major areas of difficulty and inconsistency in English spelling.

A pilot study.

An opportunity arose to evaluate a spelling test designed by teachers of 8 year old pupils in a primary school. The test consisted of 60 core words, judged by the teachers to sample common sight words, common misspelt words and basic phonic, patterns. The test had already been administered in single word dictation form. It was decided to incorporate the words in a passage for dictation, and to give the new version within a fortnight of the first testing. This was done, 85 pupils taking both versions of the test.

Using the two sets of results, a Pearson product-moment correlation of 0.94 was obtained. One could hardly have expected a higher result than this, even if the same test had been used. This finding indicates that the formats of the test (single word or dictated story) are to all intents and purposes equivalent. This being so, the single word dictation version is probably to be preferred as it can be completed more quickly and is easier to mark.

The high correlation obtained also indicates that the reliability of the test is adequate for individual measurement, and may indeed justify an examination of its possible diagnostic use through the derivation of scores for different types of error.

A four-category scoring scheme was chosen, which the writer had previously developed for use with the Carver Word Recognition Test (Carver, 1970). In the analysis of word recognition errors, this method had yielded better test-retest reliability coefficients than other methods of classification. An earlier attempt to classify errors as either visual or auditory had been abandoned mainly because of lack of test-retest stability of 'auditory' errors. The following scoring rules were applied, which avoided problems of overlapping categories:

1) If the pupil's spelling contains fewer letters than the target word, score as 'S' (simplification error), and do not consider any other errors which may be present.

2) If all letters are present, but 'in the wrong order, score as 'O' (order error). Do not score 'O' if letters are omitted or added.

3) If the 'S' and 'O' errors have been avoided, look for the first error (from left to right) made in the representation of graphemes in the target word. These errors may involve either omission or addition, and are scored as 'C' (consonant) or 'V' (vowel) according to the appropriate grapheme in the target word.

It is recognised that this scoring scheme inevitably distorts the relative frequency of occurrence of different types of error, by increasing the ratio of consonant in proportion to vowel errors, for example.

The test papers were marked and mean error rates examined graphically, in order to see whether certain types of error varied more than others with overall level of spelling competence. The results are shown in Fig. I where mean results for the four quartiles of total test score are plotted (n=96).

Fig. I.

Frequency of simplification, vowel, consonant, and letter order errors, for the four quartiles of total spelling score.

It can be seen that letter order errors were the least common, and occurred with essentially the same frequency at all levels of competence. Other types of error showed a marked decline over the range of competence, maintaining the same rank order in frequency of occurrence.

In order to evaluate the above results more objectively, the reliability of the error category scores was examined.

Test-retest reliability coefficients were computed for each of the four categories and were found to bear some relation to the overall frequency of each type of error.

Table I.

Test-retest reliability of error scores (n= 85).
Error type
While the three categories of simplification, vowel and consonant errors show a moderate degree of stability, the order category is clearly not stable. To some extent, this result reflects the inadequacy of the test. Certainly the range of order error scores was restricted (no pupil making more than four errors), and the form of the distribution skewed (42% making no errors at all). At the same time it is possible that letter-order errors are associated with random lapses of attention which may be affected by uncontrolled situational variables.

In order to see whether the four categories of error do in fact represent different aspects of skill, correlation co-efficients between the error categories were computed, using the single-word version of the test. The results are given in Table 2.

Table 2.

Correlations between error categories
It is evident that the vowel and consonant categories are relatively closely linked (r=0.71), and that simplification errors are more closely associated with vowel errors than with consonant errors. The difference between the two correlation coefficients (0.63 and 0.47) is significant at the 1% level. The vast majority of simplification errors involve ignorance of digraphs and trigraphs, most of which are vowel rather than consonant spellings.

It can be seen that the relationship between vowel and consonant errors is of the same order of magnitude as the reliability of each of these measures. This finding weighs against the assumption that different kinds of skill are involved in learning to represent vowel and consonant sounds correctly. It does, however, appear that when consonant errors are made, omission of letter or of sound occurs less frequently than in the case of vowel errors.

The low reliability of letter order errors and their failure to correlate with other types of error makes interpretation difficult.

Implications of the study.

It is clearly possible for teachers to produce a valid and reliable spelling test for a particular age group by drawing up a list of 60 words.

It is doubtful, however, whether any useful diagnostic information can be gleaned even from a test of this length. If we apply the Spearman-Brown formula, we find that the test would need to consist of as many as 240 words if the consonant category were to reach the satisfactory reliability level of rtt 0.90. A further implication would be that if we are sampling a child's writing in order to build up an error profile, we should continue until a minimum of 10-12 errors have been recorded under all categories used. Further work is needed on the various types of error category, but it is unlikely that errors of letter order will warrant separate attention. The most common source of difficulty is undoubtedly the longer words, and next to this comes the spelling of vowels where complexity and lack of regularity present considerable problems to children.


Carver, C. (1970) Word Recognition Test. Sevenoaks: Hodder & Stoughton.

Clarke, A. (1975) A dictation spelling test (mimeo). London: Child Guidance Training Centre.

Durrell, D.D. (1955) Durrell Analysis of Reading Difficulty. New York: Harcourt, Brace and World, Inc.

France, N. and Fraser, I. (1975) Richmond Tests of Basic Skills. Sunbury-on-Thames: Nelson.

Peters, M. L. (1974) The significance of spelling miscues. in Wade, B. and Wendell, K. (Eds.) Spelling: task and learner. Univ. of Birmingham.

Schonell, F.J. (1932) Essentials in Teaching and Testing Spelling. London: Macmillan.

Young, D. (1976) SPAR Spelling and Reading Tests. Sevenoaks: Hodder & Stoughton.

Back to the top.