This is a list of formulas which predict textual difficulty.

## Overview

These are ways of predicting how hard a piece of writing will be to understand (its textual difficulty). Research has shown that two main factors affect the ease with which texts are read.[1]

1. How difficult the words are: this is lexical difficulty. Rare words are less well known than common words. Rare, difficult words are often longer than common, easy words.
2. How difficult the sentences are: this is syntactical difficulty. Long, complicated sentences cause more difficulty than short, simple sentences.

Formulae for predicting how difficult a sample of prose will be for readers are called "readability formulae". Some measure only the difficulty of the vocabulary: they are one-variable measures. Others include a measure of syntax such as sentence length.

## Validity of the formulae

Validity of formulae can be judged by comparing them to each other, which is a kind of consistency check. More important is a check for how well they predict an independent ("outside") criterion of readability.

One way to do this is to use a set of graded test passages. There the correlation coefficient of the better formulae "hovers around 70%".[1]p113 There have been many dozens of experimental tests, summarised by Klare.[1]p121–156 Correlations between readability measures and comprehension scores on passages are usual, as are correlations between readability scores are grade levels as chosen by experienced teachers. Quite an important result was got by Murphy, who increased the readability of a farm journal and found the readership much increased.[2][3]

## One-variable formulae

### SMOG

The SMOG formula uses one variable to predict the difficulty of a passage of prose. It was developed G. Harry McLaughlin in 1969 to make calculations as simple as possible. Like Gunning-Fog the formula uses words which have 3 or more syllables as an indicator for hardness; these words are said to be polysyllabic.

The original formula was given for samples of 30 sentences. It is:

${\displaystyle {\mbox{SMOG grade}}=1.0430{\sqrt {\mbox{hard words in 30 sentences}}}\ +3.1291}$

This can be adjusted to work with any number of sentences:

${\displaystyle {\mbox{SMOG grade}}=1.0430{\sqrt {{\mbox{hard words}}\times {\frac {30}{\mbox{sentences}}}}}\ +3.1291}$

McLaughlin also created directions for an approximate version which can be done with just mental calculation.

1. Count the number of words with 3 or more syllables, excluding names, in a set of 30 sentences
2. Take the square root of the nearest perfect square

## Two-variable formulae

### The Dale–Chall formula

Edgar Dale, a professor of education at Ohio State University, was one of the first critics of Thorndike's vocabulary-frequency lists. He claimed that they did not distinguish between the different meanings that many words have. He created two new lists of his own. One, his "short list" of 769 easy words, was used by Irving Lorge in his formula. The other was his "long list" of 3,000 easy words, which were understood by 80 percent of fourth-grade students. In 1948, he incorporated this list in a formula which he developed with Jeanne S. Chall, who was to become the founder of the Harvard Reading Laboratory.

To apply the formula:

1. Select several 100-word samples throughout the text.
2. Compute the average sentence length in words (divide the number of words by the number of sentences).
3. Compute the percentage of words NOT on the Dale–Chall word list of 3,000 easy words.
4. Compute this equation

Raw Score = 0.1579PDW + 0.0496ASL + 3.6365

Where:

Raw Score = uncorrected reading grade of a student who can answer one-half of the test questions on a passage.
PDW = Percentage of Difficult Words not on the Dale–Chall word list.
ASL = Average Sentence Length

Finally, to compensate for the "grade-equivalent curve," apply the following chart for the Final Score:

Raw Score Final Score
4.9 and below Grade 4 and below
9.0 to 9.9 Grades 13–15 (college)
10 and above Grades 16 and above[4]

In 1995, Dale and Chall published a new version of their formula with an upgraded word list.[5]

The formula for the Flesch reading-ease score is

${\displaystyle 206.835-1.015\left({\frac {\text{total words}}{\text{total sentences}}}\right)-84.6\left({\frac {\text{total syllables}}{\text{total words}}}\right).}$[6]

Scores can be interpreted as shown in the table below.

Score Notes
90.0–100.0 easily understood by an average 11-year-old student
60.0–70.0 easily understood by 13- to 15-year-old students
0.0–30.0 best understood by university graduates

The US Department of Defense uses the reading ease test as the standard test of readability for its documents and forms.[7] Florida requires that life insurance policies have a Flesch reading ease score of 45 or greater.[8]

Use of this scale is so ubiquitous that it is bundled with popular word processing programs and services such as KWord, IBM Lotus Symphony, Microsoft Office Word, WordPerfect, and WordPro.

### Gunning Fog

The Gunning Fog, sometimes called the Fog index, is a formula developed by Robert Gunning. It was first published in his book The Technique of Clear Writing in 1952. It became popular because the score is easy to calculate.

The formula has been criticized as it mainly uses sentence length. The critics argue that texts created with the formula will use shorter sentences without using simpler words. However, this criticism confuses prediction of difficulty with production of prose (writing). The role of readability tests is to predict difficulty; writing better prose is quite another matter. As discussed in prose difficulty, sentence length is an index of syntactical difficulty.[1]

${\displaystyle {\mbox{Gunning Fog grade}}=0.4\times \left[{\frac {\mbox{words}}{\mbox{sentences}}}+\left(100\times {\frac {\mbox{hard words}}{\mbox{words}}}\right)\right]}$

Where:

• words is number of words
• sentences is number of sentences
• hard words is the number of word with 3 or more syllables (excluding endings) which are not names or compound words

### Spache

The Spache method compares words in a text to a list of words which are familiar in everyday writing. The words that are not on the list are called unfamiliar. The number of words per sentence are counted. This number and the percentage of unfamiliar words is put into a formula. The result is a reading age. Someone of this age should be able to read the text. It is designed to work on texts for children in primary education or grades from 1st to 7th.

${\displaystyle {\mbox{Spache grade}}=\left(0.141\times {\frac {\mbox{words}}{\mbox{sentences}}}\right)+\left(0.086\times {\frac {\mbox{unfamiliar words}}{\mbox{words}}}\right)+0.839}$

In 1974 Spache revised his Formula to:

${\displaystyle {\mbox{Spache grade (revised)}}=\left(0.121\times {\frac {\mbox{words}}{\mbox{sentences}}}\right)+\left(0.082\times {\frac {\mbox{unfamiliar words}}{\mbox{words}}}\right)+0.659}$

### Coleman-Liau Index

The calculations are performed in two steps. The first step finds the Estimated Close Percentage. The second step calculates the actual grade.

${\displaystyle {\begin{array}{lcl}{\mbox{ECP}}=141.8401-\left(0.214590\times {\mbox{characters}}\right)+\left(1.079812\times {\mbox{sentences}}\right)\\{\mbox{CLI}}=\left(-27.4004\times {\frac {\mbox{ECP}}{100}}\right)+23.06395\end{array}}}$

A simple version also exists that is not as accurate:

${\displaystyle {\mbox{CLI}}=\left(5.88\times {\frac {\mbox{characters}}{\mbox{words}}}\right)-\left(29.5\times {\frac {\mbox{sentences}}{\mbox{words}}}\right)-15.8}$

The Automated Readability Index was designed for real-time computing of readability for the electric typewriter.[9]

${\displaystyle {\mbox{ARI}}=4.71\times {\frac {\mbox{letters}}{\mbox{words}}}+0.50\times {\frac {\mbox{words}}{\mbox{sentences}}}-21.43}$

## References

1. Klare G. 1963. The measurement of readability. Ames, Iowa:Iowa State University Press.
2. Murphy D.R. 1947. Tests prove short words and sentences get best readership. Printer's Ink 218: 61–64.
3. Murphy D.R. 1947. How plain talk increases readership 45 to 66 per cent. Printer's Ink 220: 35–37.
4. Dale, E. and J. S. Chall. 1948. '"A formula for predicting readability". Educational research bulletin Jan.21 and Feb 17, 27:1–20, 37–54.
5. Chall J.S. & E. Dale. 1995. Readability revisited: The new Dale–Chall readability formula. Cambridge, MA: Brookline Books.
6. Flesch [1]
7. Luo Si (2001). "A statistical model for scientific readability". Atlanta, GA, USA: CIKM '01.
8. "Readable Language in Insurance Policies"
9. Senter R.J. (1967). "Automated Readability Index". Wright-Patterson Air Force Base: iii. AMRL-TR-6620. Retrieved 2012-03-18. Unknown parameter |coauthors= ignored (|author= suggested) (help); Cite journal requires |journal= (help)

• Coleman M. (1975). "A computer readability formula designed for machine scoring". Journal of Applied Psychology. 60 (2): 283–284. Unknown parameter |coauthors= ignored (|author= suggested) (help)