[Journal of the Simplified Spelling Society, J24, 1998-2, pp23-27]
[Ken Spencer: see Newsletter, Media.]

[Note: this page is set to display International Phonetic Alphabet characters. If your browser over-rides this setting, by default or by choice, the IPA characters may not show.]

Predictive Models of Spelling Behaviour in 7- & 11-year-olds.

Ken Spencer.

Ken Spencer is Lecturer in Media Studies and Educational Technology in the School of Education, Hull University. The statistical Tables are given at the end of the article.

0. Abstract.

A predictive model for spelling is suggested, based on the results of spelling tests taken by 2,684 seven- and eleven-year olds, in 1996. The tests were part of a national scheme of testing for the School Curriculum & Assessment Authority (SCAA). The factors identified as influencing the number of children correctly spelling a word are: word length (number of letters), frequency of a word, and a measure of the word's phoneticity. A measure of the most infrequent form of representation of the phonemes in a word (the 'trickiest' phoneme) is a strong factor with 7-year-olds, whereas a measure of the average phoneticity of a word is a better indicator of word difficulty for 11-year-olds, who are susceptible to the mitigating effects of high word frequency on irregular spellings.

1. Learning for mastery.

Carroll's model of school learning (1963) suggests that most pupils are capable of reaching levels of performance more usually associated with the top 10-20% of the school population. He proposed that a number of variables could be manipulated to increase levels of performance, such as the quality of teaching materials and the amount of time available for learning. Some variables associated with the pupil, such as perseverance, may be difficult to manipulate.

Carroll's model has been applied to teaching by Bloom and his associates, in the methodology known as Learning for Mastery, which places great emphasis on formative testing in order to determine deficiencies in either learning or teaching. An essential within this system is the requirement that high levels of performance are demanded at the early stages of learning, which ultimately result in higher overall levels of performance. Required criterion levels are as high as 100%, although more usually they range between 80-90%.

2. Case-study in remedial literacy.

This approach has recently been applied to the teaching of reading (Spencer, 1996), using computer-based learning techniques. In the case of one pupil, who had reached the age of 10.5 years without being able to read the most common word in the English language, decisions had to be made concerning the teaching of the most common 100 words: should a phonics approach be adopted, or a method based on gradually increasing the demands in a simple spelling exercise. Many of the most common words fail to obey even the most rudimentary rules and so the simple spelling approach, with increasing mastery demands, was adopted. Practising for 10-15 minutes per day the pupil mastered 80% of the words over a period of 12 weeks.

It was clear from the performance of this pupil that much of his problem was associated with the vagaries of English spelling - he simply gave up when he applied rules to common English words and was told that he was wrong: the rules didn't work, and what should have been a simple task proved impossible. With the continuing concern of parents, teachers and politicians about the levels of literacy in the UK, the question arises: are we disadvantaging our children by making a simple task incomprehensible?

3. Searching for models.

Of particular interest to researchers investigating the application of computer-based literacy systems is the search for a predictive model of spelling performance. Knowledge of such a model would indicate the factors that make words difficult to spell; determine if they are the same for all ages; and may indicate how strategies change with age, to make spelling more accurate. This, in turn, has implications for reading. Frith's (1985) six-step model of literacy development suggests that there is an initial period when children use a logographic strategy to read and a phonological strategy to spell, ie, they read and spell in different ways. According to this model, the emergence of phonemic representations in spelling leads to advances in later reading. Rego (1991) demonstrated that the ability to spell non-words is strongly related to progress in reading, and this has been confirmed by Lazo et al. (1997), who show that early attempts to read words are strongly related to the progress made in spelling, as early attempts to spell words influence later reading.

The following analysis, which identifies several models for learning to spell, is based on data from national tests carried out by SCAA in 1996. SCAA's activities have recently been incorporated into the Qualifications and Curriculum Authority (QCA) whose statutory functions were set out in the Education Act, 1997. Principally, by its forthcoming review of the school curriculum, QCA hopes to define the structure and content of teaching and learning that will enable all pupils to develop and demonstrate their knowledge, skills and understanding. QCA's functions and responsibilities include: developing learning goals for the under-fives; accrediting assessment schemes for children entering primary school; monitoring and reviewing the National Curriculum and its assessment; continuing with the development of national assessment at the ages of 7, 11 and 14.

4. Factors that may be relevant.

When attempting to build a predictive model of behaviour the researcher usually has in mind factors that may be relevant. In the case of spelling-accuracy, the present approach looked at the following factors:
number of letters in the word
phoneticity of the word
frequency of the word

4.1 Length of the word.


This is a simple measure and was included because in the initial stages of spelling (and reading) 7-year-olds are still developing short term memory strategies, and any lapses in memory are likely to manifest themselves with longer words. Longer words also give more opportunities for errors.

4.2 Phoneticity of the word.


This is seen as a major factor, but one that can be defined in a variety of ways. There is no standardized way of measuring this factor and a number of approaches were adopted and refined.

1. Phonic Ratio.
The first approach is to look at the individual letters of the word and measure the degree to which they correspond to a simple alphabetic representation (as in the word hat). This is expressed as a ratio of the number of letters pronounced as in the simple alphabet, divided by the total number of letters. The word hat has a simple phonic ratio value of 1; boat has a value of 0.5; and shout a value of 0.2 (see Table 1). This is a crude method, only accounting for sounds represented by single letters, so it will be less powerful at predicting than other measures. The astonishing thing is that, for 7-year-olds, it is a predictor at all.

2. Phoneme frequency measures.
It was recognized that simple phoneticity might be a factor with younger children. A more sophisticated measure was also developed which could be applied to both 7- and 11-year-old age groups. Children learn at an early age that a variety of representations can be used for the same sound and, as SCAA recognized, the difficulty is less knowing the patterns than knowing which pattern to use in each individual word. In order to establish the range of representations of the phonemes that make up the English language, and the frequency of each representation, the 3,500 most common words from the LOB Corpus (Hofland and Johansson, 1982) were analysed.

The phonetic representation of each of the 3,500 words was determined from the Oxford English Dictionary (Second Edition, CD-ROM version) enabling the standard alphabetic representation of each phoneme to be determined for each word. With each phoneme coded, tables showing the various forms of representation for each phoneme were extracted. The average number of representations per phoneme was 5.95 (a total of 262 for the 44 English phonemes used in the O.E.D.). Of course, some phonemes have relatively few forms of representation, while others have many more. As for frequency, the common phoneme /ɩ/ (as in hill /hɩl/) represents 9.64% of the sample phonemes (total = 20,197); and the infrequent /ʒ/ (as in visual /vɩʒʋəl/) represents only 0.13%.

Knowing the different representations of each phoneme allows two measures of the frequency for each form of representation. The first is the proportion of the particular representation for that phoneme (PhR); the second is the frequency of the particular representation in relation to the total number of phonemes in the LOB corpus (PhT), thus showing how often it occurs in running text.

2.i Representation as a proportion of all representations of the phoneme (PhR).

This measure considers a particular representation of a phoneme only in relation to other representations of that phoneme. Percentage values for all representations of the phoneme total 100%.

Table 3 shows the values for the /e/ phoneme (as in den /den/). This phoneme represents 3.36% of all the phonemes in the sample. The most common alphabetic representation is E, and this is found in 90% of cases (PhR value of E representation for /e/ phoneme). The rare form AI has a PhR value of 0.6%.

2.ii Representation as a proportion of all phonemes in the LOB sample (PhT).

This measure is necessary because phonemes occur at different frequencies, and the difference between the most common phonemes (/ɩ/ at 10%) and least common (/ʒ/ at 0.1%) is considerable. In terms of the total number of phonemes in the sample, an infrequent form of a common phoneme may be encountered more often than the usual representation of a less common phoneme. The percentage values for PhT for a particular phoneme will add up to the frequency of that phoneme in the total sample. Table 3 illustrates this: the total for PhT is 3.36%, which represents the frequency of the /e/ phoneme in the total sample.

Knowing the frequency of each form of representation for each phoneme allows an average phonetic value to be calculated for each word. This value can be calculated for both PhR and PhT (see Tables 1 & 2).

In addition, particularly unusual phonemic representations can be identified. In the case of the data for 7-year-olds, the most infrequent form of representation was determined within each word, giving a value for the 'tricky' phonemes, in terms of PhR values, eg, /ɒ/ as represented by AU only occurs in 0.32% of cases and is the trickiest phoneme representation in the word because, since all the other representations have a higher value than this.

4.3 Frequency of the word.


The frequency of word occurrence also seems likely to influence the spelling and reading of words: the more common a word, the more likely it is that the form will be internalized by the learner. The LOB corpus provides an ordered list of the most common 7,000 words. The total number of occurrences of a word within the entire corpus (1,000,000 words) is also given; this absolute frequency was used as a factor in the analysis.

4.4 Spelling scores and probabilities.


The spelling test for 7 year-olds (Key Stage 1) was taken by 1,184 children working at level 2 from SCAA's Schools Sampling Project, a national representative sample of schools taking part in a longitudinal monitoring survey. The test for 11-year-olds (Key Stage 2) was taken by 1,500 pupils from the University of Bath's sample for the 1996 Standard Assessment Tasks (SATs). The data available from SCAA were in the form of percentage correct scores for each word. This score was converted to a probability value for use in the regression analysis. The following formula was used:

Log 10 (probability right/probability wrong).

5. Analysis of data.

The analysis of the data was undertaken with the multiple regression module in the Statistical Package for the Social Sciences (SPSS, version 6.1.1 for Macintosh computers). Regression methods utilize the presence of an association between two variables to predict the values of one from those of another. The regression analysis attempts to predict the spelling behaviour of the two age groups from characteristics (frequency, length, phoneticity) of the words.

5.1 Results of the Multiple Regression Analysis for 7-year-olds.


Initial consideration was given to the more obvious factors that are likely to affect the spelling performance of 7-year-olds: the number of letters in word (LETTERS) and the simple phonic ratio (PHONIC), as given in Table 1. The results are given in Table 4, which shows highly significant correlations between the standardized spelling score (LOGPROB) and the two factors. The regression analysis shows that the more powerful predictor is the number of letters in the word. When the absolute frequency of words was included in the analysis no significant correlations were found for that factor; for 7-year-olds, frequency of the selected words does not appear to influence spelling-accuracy.

A second analysis, using more detailed information about the phonetic structure of the words (Table 1: Average PhR; Average PhT; and Tricky phonemes), was conducted. Significant correlations were not found for either PhR or PhT, but the so-called "tricky" phonemes factor was highly correlated (0.77) with the standardized spelling score (Table 4). The analysis demonstrated that the 'tricky' phoneme factor was a more powerful predictor than the simple phonic ratio used in the initial analysis. Both factors contribute in an equal but opposite way in the prediction of spelling scores. The words selected for the 7-year-old tests are not as complex as those for the 11-year-olds; and the 'tricky' phonemes measure identifies those words with particularly unusual spelling features. This factor is exemplified in the contrast between the word hat, in which the greatest uncertainty is in the T representation (T=95.90%; TT=3.6%; ED=0.5%), and friends, in which the IE is a unique representation (E=90.6%; EA=7%; A=1.2%; IE=0.6%; AI=0.6%). The greater the uncertainty in the representation of the phoneme, the lower the spelling score. The results of the test for 7-year-olds show that the predictive model has 2 factors: number of letters in the word and degree of difficulty of representation (as measured by relative frequency of occurrence) of key phonemes.

5.2 Results of the Multiple Regression Analysis for 11-year-olds.


The words used in the 11-year-old test (Table 2) are more complex than those in the 7-year-old test: they have, on average, 2 additional letters; and some words have several phonemes with rare forms of representation. Table 4 shows those factors which have significant correlations with the standardized spelling scores (LOGPROB) for 11-year-olds: absolute frequency of occurrence (FREQABS) in the LOB corpus; number of letters (LETTERS); and the average frequency of phonetic representations as a function of the total number of phonemes (PhT). The predictive value is almost identical to that obtained with the model for the 7-year-olds. The regression equation shows all factors contributing to spelling behaviour in a similar manner, but with number of letters acting in the opposite direction. This model suggests that the spelling behaviour of older pupils, when responding to more complex words, will deteriorate for less common words that are longer and use unusual forms of phonemic representations.

6. Discussion.

Working from data collected by SCAA for more than 2,500 children in 1996, factors have been identified which predict the percentage of pupils likely to correctly spell the given words at ages 7 and 11. The factors identified are those which are arrived at by any common sense view of the level of difficulty that words present to pupils: number of letters, frequency of usage, and the presence of unusual forms of phonemic representation.

There is often criticism of poor spelling in schools and even at University level. This study has clearly indicated that a major factor in poor spelling, which will also be reflected in poor reading, is the failure of English spelling to conform to specific rules for the representation of phonemes. For 7-year-olds, words with unusual written forms are much more difficult, and the more unusual the written form, the more difficult they are to spell. For 11-year-olds, the words tested were longer, less frequently used, and more likely to have several unusual forms of representation. In this case, because 11-year-olds have acquired much greater experience with words, unusual representations may be mitigated by more frequent use. Even bizarre representations are learned by 11-year-olds if they are frequently encountered.

The analyses of data presented here clearly indicate that a major cause of poor spelling is to be found in the form of representation of the words, and not solely in the students. The main problem is that for many words the form has to be known and remembered, because the imperfect patterns which govern English cannot always be applied to give the correct result. Instead of using coherent patterns that always give correct answers, written English has developed as a system which requires a great deal of rote learning. This takes time and energy that could be better employed in other educational activities.

By the age of 11 years, most students are able to deal successfully with all but the most unusual written forms for word sounds. By regularizing such highly irregular forms (eg, friends, stretched), spelling, reading and the self-confidence of these students would be greatly enhanced.

If we do not develop a rational system of English spelling, we must accept the consequences: to eradicate poor spelling and reading at a national level, much more time must be devoted to learning the idiosyncratic written forms. The extra time needed will be at the expense of other subjects such as maths, science and technology.

Table 1: Word values for 7-year-old test
Word Phonetic
rendering
No. of
letters
Phonic
ratio
Absolute
frequency
  %
Score
Logprob
score
Average
PhR
Average
PhT
Tricky
phonemes

because
boat
bucket
family
fish
flag
friends
hand
hat
holiday
house
morning
net
pictures
road
shout
smile
sock
spade
wait
bɩkɒz
bo:t
bʌkιt
fæməli
fιʃ
flæg
frendz
hænd
hæt
hɒlədeι
haυs
mɔ:(r)nιŋ
net
pιktʃə(r)z
ro:d
ʃaυt
smaιl
sɒk
speιd
weιt
7
4
6
6
4
4
7
4
3
7
5
7
3
8
4
5
5
4
5
4
0.29
0.5
0.67
0.83
0.5
1
0.71
1
1
0.57
0.4
0.71
1
0.38
0.5
0.2
0.4
0.75
0.6
0.5
777
72
6
281
121
8
177
460
56
74
571
233
53
83
205
10
76
1
2
82
  35
55
23
29
84
83
25
85
97
40
62
41
91
13
54
39
32
61
39
27
-0.27
 0.09
-0.52
-0.39
 0.72
 0.69
-0.48
 0.75
 1.51
-0.18
 0.21
-0.16
 1
-0.83
 0.07
-0.19
-0.33
 0.19
-0.19
-0.43
42.84
66.94
58.6
72.8
60.06
91.87
73.05
97.17
98.63
64.4
59.32
72.87
94.43
57.14
64.15
64.85
70.24
53.92
74.1
58.18
1.55
3.03
2.51
2.15
2.97
2.45
3.14
3.52
3.29
1.94
0.42
3.65
5.97
2.47
2.9
2.67
3.23
1.96
3.07
2.75
00.32
05.48
05.2
03.4
25.08
84.44
00.6
91.68
95.9
03.4
04.41
48.53
90.4
02.8
05.48
25.08
32.89
05.2
48.22
14.97


Table 2: Word values for 11-year-old test
Word Phonetic
rendering
No. of
letters
Absolute
frequency
  %
Score
Logprob
score
Average
PhR
Average
PhT

beautiful
crept
disturbed
echoed
heard
honest
notice
piece
remained
replace
shook
silence
slipped
sneeze
sprawling
still
stretched
tallest
uncoiled
visitors
bju:tιfəl
krept
dιstз:(r)bd
eko:d
hз:d
onιst
no:tιs
pi:s
rιmeιnd
rιpleιs
ʃυk
saιləns
slιpt
sni:z
sprɔ:lιŋ
stιl
stretʃt
tɔ:lest
ʌnkɔιld
vιzιtəz
9
5
9
6
5
6
6
6
8
7
5
7
7
6
9
5
9
7
8
8
85
5
26
12
239
33
103
63
103
20
53
92
32
1
4
823
23
1
1
37
  48
71
63
55
74
89
83
64
64
84
54
68
65
67
39
97
34
85
57
71
-0.03
 0.39
 0.23
 0.09
 0.45
 0.91
 0.69
 0.25
 0.25
 0.72
 0.07
 0.33
 0.27
 0.31
-0.19
 1.51
-0.29
 0.75
 0.12
 0.39
60.83
88.92
63.93
38.21
67.05
56.21
66.06
36.14
55.67
47.65
20.96
50.68
46.2
43.63
67.31
60.05
58.91
60.37
65.19
63.8
2.94
4.34
3.48
1.76
1.55
5.18
4.63
1.38
3.05
2.9
0.37
3.25
3.3
2.99
3.51
4.72
3.22
3.75
2.91
3.44


Table 3: Representation for /e/ phoneme
/e/ phoneme = 3.36% of total phonemes in sample

Word Phonetic
rendering
Spelling % of /e/
phoneme
(PhR)
% of total
phonemes
(PhT)

dental
heather
anybody
friendship
against
dentl
heðə(r)
enιbɒdι
frendʃιp
əgenst
e
ea
a
ie
ai
90.60
7.00
1.20
0.60
0.60
100.00
3.04
0.24
0.04
0.02
0.02
3.36


Table 4: Correlation values for 7- & 11-year-old tests
Correlation, 2-tailedsignificance,7-year-olds:

 LETTERSPHONIC TRICKY
LOGPROB
LETTERS
PHONIC
* p<0.01
-0.76*  0.16*
-0.402
 0.77*
-0.56*
 0.70*

Correlation, 2-tailedsignificance,11-year-olds:

 LETTERSPHONIC TRICKY
LOGPROB
LETTERS
FREQABS
* p<0.01
-0.55*  0.66*
-0.37
0.51*
0.13
0.22


References.

Bloom, B.S. (1981) All our children learning. Chapter 8:learning for mastery: pp. 153-177. McGraw-Hill: London.

Carroll, J. (1963) 'A model of school learning', Teachers College Record, 64, 723-733.

Frith, U. (1985) Beneath the surface of developmental dyslexia. In K.E. Patterson, J.C. Marshall & M. Coleath (Eds.), Surface Dyslexia (pp. 301-330). London: Erlbaum.

Hofland, K. and Johansson, S, (1982) Word Frequencies in British and American English. Bergen: The Norwegian Computing Centre for the Humanities.

Lazo, M.G., Pumfrey, P.D. & Peers, L. (1997) 'Metalinguistic awareness, reading and spelling: roots and branches of literacy', Journal of Research in Reading, 20(2), 85-104.

School Curriculum & Assessment Authority (1997) Standards at Key Stage 1 English & Mathematics. Report on the 1996 National Curriculum Assessments for 7-year-olds. London: QCA.

School Curriculum & Assessment Authority (1997) Standards at Key Stage 2 English & Mathematics. Report on the 1996 National Curriculum Assessments for 11-year-olds. London: QCA.

Rego, L.B. (1991) The role of early linguistic awareness in children's reading and spelling. University of Oxford, unpublished DPhil Thesis.

Spencer, K.A. (1996) 'Recovering Reading Using Computer Mastery Programmes', British Journal of Educational Technology, 27(3), 191-203.


Back to the top.