Readability and Word Length
September 14, 2012
As students know, some things are harder to read than others. The spectrum of writing extends from the single syllable words in the Dick and Jane books of my youth, to the heady reading found in college text books and articles in scholarly journals.
In one memorable episode of The Bob Newhart Show, Bob's dentist friend, Jerry, writes a children's book. He submits his manuscript as many pages with one word per page. His rejection letter arrives in the same format, one word per page.
Many word processors have lexicographic analysis functions, such as word count, which is an important metric for student submissions. They also have a readability analysis designed to estimate the target audience for your text. The most popular of these is the Flesch-Kincaid readability test that presents its results as either a percentage reading ease, or as a grade level index; viz.,
Flesch Reading Ease =
in which words, sentences and syllables are the total counts for these objects in the manuscript. The grade level is designed to track the school grade levels in the US. The reading ease corresponds to text being understood by a particular age group (90->100 = 11 year olds, 60->70 = 13-15 year olds, and 0->30 = college graduate.)
When my children were in elementary school and high school, we would apply the grade test to their school reports to see how they fared. The object there was to make the grade level as high as possible. The purpose of these tests is actually the opposite. Development of the grade level test was funded by the US military to ensure that their training materials and maintenance manuals were understood. It's also used by some publishers to "dumb-down" their content to make it more salable. Next time you're in the supermarket, scan the tabloids at the checkout.
I don't try to dumb-down anything in this blog, but its reading level is not that extreme. A recent, relatively low-tech article, Work, September 3, 2012 has a Flesch Reading Ease of 62%, and a Flesch-Kincaid Grade Level of 8.9. The previous, more technical article (Harder than Diamond, August 31, 2012), scores 49.9% and grade 9.5. There's no reason why your high-schooler shouldn't be reading this blog!
These scores were calculated by a C language program I developed just for this purpose. You can grab the source code here. Looking at the above formulas, you would think that such a program is easy to write. Counting words and sentences is somewhat easy, but the syllable count is the hard part. An extreme program might use a dictionary for this, but many of the words in this blog would not be found there. Instead, we use a simple method that's accurate enough for our purpose.
The vowels (a,e,i,o,u,y) are the key. The number of syllables in a word is almost always equal to the number of vowels, with two conditions. When vowels appear in pairs (diphthongs), they have a single sound, so we eliminate any vowel that follows another. Also, there are certain silent endings that must be addressed. We simply eliminate -e, -es and -ed from our count. This syllable count is not 100% accurate, but how accurate are the readability scores themselves? All scientists know that approximation is allowed in certain cases.
As can be seen in the readability formulas, the number of syllables per word is the most important factor. This is no surprise to children who complain about "big words," so word length is an important linguistic concept. An article about word length has recently been posted on the arXiv preprint server.
The authors used the Google Books corpus for the analysis of temporal trend in word length. I wrote about linguistic analysis using Google Books and the Google Ngram Viewer in two previous articles (Culturomics, January 13, 2011 and Word Extinction, August 17, 2011). Their results are shown in the graph, below.
Note the recent "dumbing-down" of American English. The authors of the arXiv paper associate the decrease in average word length with a shifting political environment. I prefer my dumbing-down hypothesis. Word length is an easily understood concept, but linguistics can get into more complicated areas, as another just published paper demonstrates.[4-5]
205.835 - (1.015*(words/sentences))-(84.6*(syllables/words))
Flesch Kincaid Grade =
- J. Peter Kincaid, Richard Braby and John E. Mears, "Electronic authoring and delivery of technical information," Journal of Instructional Development, vol. 11, no. 2 (June, 1988), pp. 8-13.
- "The Life and Times of Frank Purdue" (Unpublished).
- Vladimir V. Bochkarev, Anna V. Shevlyakova and Valery D. Solovyev, "Average word length dynamics as indicator of cultural changes in society," arXiv Preprint Server, August 30, 2012.
- How language change sneaks in, Linguistic Society of America Press Release, September 4, 2012 (PDF File).
- Hendrik De Smet, "The Course of Actualization," Preprint of Language paper, to appear, September, 2012 (PDF File).
- Phonics on the Web.
Permanent Link to this article
Linked Keywords: Student; syllable; Dick and Jane books; college text book; Sokal affair; articles in scholarly journals; The Bob Newhart Show; dentist; children's literature; children's book; word processor; lexicography; lexicographic; readability; analysis; target audience; Flesch-Kincaid readability test; education in the United States; school grade levels in the US; college graduate; In Search of Lost Time; À la recherche du temps perdu; Marcel Proust; Frank Perdue; Wikimedia Commons; child; children; elementary school; high school; United States Armed Forces; US military; training; maintenance, repair, and operations; manual; publisher; dumb-down; supermarket; supermarket tabloid; point of sale; checkout; C language; program; source code; readability.c; dictionary; vowel; diphthong; scientist; approximation; linguistics; arXiv preprint server; Google Books; corpus; Culturomics; Google Ngram Viewer; American English; politics; political; environment; hypothesis; J. Peter Kincaid, Richard Braby and John E. Mears.
Latest Books by Dev Gualtieri
Thanks to Cory Doctorow of BoingBoing for his favorable review of Secret Codes!
Blog Article Directory on a Single Page
- J. Robert Oppenheimer and Black Holes - April 24, 2017
- Modeling Leaf Mass - April 20, 2017
- Easter, Chicks and Eggs - April 13, 2017
- You, Robot - April 10, 2017
- Collisions - April 6, 2017
- Eugene Garfield (1925-2017) - April 3, 2017
- Old Fossils - March 30, 2017
- Levitation - March 27, 2017
- Soybean Graphene - March 23, 2017
- Income Inequality and Geometrical Frustration - March 20, 2017
- Wireless Power - March 16, 2017
- Trilobite Sex - March 13, 2017
- Freezing, Outside-In - March 9, 2017
- Ammonia Synthesis - March 6, 2017
- High Altitude Radiation - March 2, 2017
- C.N. Yang - February 27, 2017
- VOC Detection with Nanocrystals - February 23, 2017
- Molecular Fountains - February 20, 2017
- Jet Lag - February 16, 2017
- Highly Flexible Conductors - February 13, 2017
- Graphene Friction - February 9, 2017
- Dynamic Range - February 6, 2017
- Robert Boyle's To-Do List for Science - February 2, 2017
- Nanowire Ink - January 30, 2017
- Random Triangles - January 26, 2017
- Torricelli's law - January 23, 2017
- Magnetic Memory - January 19, 2017
- Graphene Putty - January 16, 2017
- Seahorse Genome - January 12, 2017
- Infinite c - January 9, 2017
- 150 Years of Transatlantic Telegraphy - January 5, 2017
- Cold Work on the Nanoscale - January 2, 2017
- Holidays 2016 - December 22, 2016
- Ballistics - December 19, 2016
- Salted Frogs - December 15, 2016
- Negative Thermal Expansion - December 12, 2016
- Verbal Cues and Stereotypes - December 8, 2016
- Capacitance Sensing - December 5, 2016
- Gallium Nitride Tribology - December 1, 2016
- Lunar Origin - November 27, 2016
- Pumpkin Propagation - November 24, 2016
- Math Anxiety - November 21, 2016
- Borophene - November 17, 2016
- Forced Innovation - November 14, 2016
- Combating Glare - November 10, 2016
- Solar Tilt and Planet Nine - November 7, 2016
- The Proton Size Problem - November 3, 2016
- Coffee Acoustics and Espresso Foam - October 31, 2016
- SnIP - An Inorganic Double Helix - October 27, 2016
- Seymour Papert (1928-2016) - October 24, 2016
- Mapping the Milky Way - October 20, 2016
- Electromagnetic Shielding - October 17, 2016
- The Lunacy of the Cows - October 13, 2016
- Random Coprimes and Pi - October 10, 2016
- James Cronin (1931-2016) - October 6, 2016
- The Ubiquitous Helix - October 3, 2016
- The Five-Second Rule - September 29, 2016
- Resistor Networks - September 26, 2016
- Brown Dwarfs - September 22, 2016
- Intrusion Rheology - September 19, 2016
- Falsifiability - September 15, 2016
- Fifth Force - September 12, 2016
- Renal Crystal Growth - September 8, 2016
- The Normality of Pi - September 5, 2016
- Metering Electrical Power - September 1, 2016
Deep Archive 2006-2008