Readability and the Common Core’s Staircase of Text Complexity
Text Matters ArticleElfrieda H. HiebertText Project & University of California, Santa Cruz
For a long time, educators have asked questions about what makes a text complex. Why is it harder for students to read some books than others? How are we to help students select texts that will challenge them without frustrating them? What type of texts will increase their reading achievement most effectively?
By adding text complexity as a dimension of literacy, the Common Core State Standards for English Language Arts (CCSS/ELA; Common Core State Standards Initiative, 2010) bring these questions to the fore. To establish text complexity, the standards propose a three-pronged system:
- qualitative analyses of features such as levels of meaning (e.g., readers need to make inferences to understand a character’s motive);
- reader-task variables such as readers’ background knowledge of a text’s topic and ways in which teachers and situations influence readers’ interactions with a text (e.g., an audio of a book or the level of teacher guidance); and
- quantitative indices such as information on the number of infrequent words and length of sentences (e.g., word indexes, sentence-length formulas, or automatic readability programs).
When the new standards were released in the spring of 2010, however, the guidance for the first two indices was vague and ill defined. Educators were encouraged to use qualitative information and reader-task variables in selecting and instructing texts but the examples and the rubrics for how to do this were few and limited to middle- and high-school exemplars. By contrast, the guidance for the third form of measurement of text complexity was highly prescriptive. For the first time in a standards document, whether from a state or national organization, specific text levels were given for grades. These levels were identified on a specific readability measure—the Lexile Framework. Since the release of the Standards, members of the CCSS writing team (Nelson, Perfetti, Liben, & Liben, 2012) have been involved in a project to establish levels on a broader group of readability measures, including ATOS (Milone, 2009) and DRP (Koslin, Zeno, & Koslin, 1987).
Readability formulas such as Lexiles, ATOS, and DRP have been used in many school districts and in states for guidance on text complexity of texts and tests. In the CCSS, however, the typical readability levels that had long been associated with grade levels have been readjusted to create a staircase of text complexity. As Figure 1 shows, Lexile scores have been accelerated, beginning with the beginning of second grade. This acceleration is intended to ensure that high school graduates are able to read the texts they will encounter in college and/or their careers. This goal is a laudable one—high school graduates should be ready to move seamlessly into the next phase of their lives. But the building of a staircase on readability levels poses potential challenges.
In the choice of CCSS writers to provide explicit guidelines on readability formulas and to accelerate text levels, the blame cannot be placed on readability formulas. As Chall (1985) makes clear, it is the interpretation of readability formulas, not the formulas themselves, that is the source of potential misuse. Indeed, it is the inappropriate uses of readability formulas that may subvert the CCSS’s laudable goal of increasing students’ ability to read increasingly complex text over their school careers. Reading educators have had a long history using and interpreting readability formulas. This knowledge now needs to be revisited and acted upon to offset potential problems as the CCSS standards become more widely implemented. To help in this effort, this article briefly describes the history, uses, contributions, and limitations of readability formulas. It then describes how teachers and publishers can use information on reading levels of texts that come from readability and guided reading systems as the first step in identifying elements of text that can provide the focus of instruction that supports students’ reading capacity.
What Do Readability Formulas Mean?
Since Lively and Pressey (1923) created the first readability formula nearly 100 years ago, well over 200 additional formulas have been developed (Klare, 1984). Early readability formulas were created to provide a reliable way to control text complexity and so make it easier to communicate important messages clearly to their intended audiences. Government agencies, especially the military, not schools, were the driving force in this undertaking. The need for finding ways to control text complexity in materials used by military personnel is clear: The inability to understand a manual on evacuation procedures or on how to handle ammunition, for example, can lead to serious consequences. But educators quickly began using readability formulas to choose texts for schools. Before long, publishers began to use readability formulas to create texts (Davison & Kantor, 1982).
The Components of Readability Formulas
Nearly all readability formulas, regardless of small differences, analyze two main features of texts: (a) syntax and (b) vocabulary. The first component is almost always measured in number of words per sentences, although a handful of formulas count the number of syllables instead. With regard to vocabulary, some formulas (e.g., Spache, 1953) compare words in a text to an index of words that have been keyed to different grade levels, while others (e.g., Fry, 1968) use the number of syllables in words as an indicator of complexity.
Until recently, readability formulas had to be applied manually by counting words or syllables and consulting word indices. Because of this, what was being measured by the readability formula was clear to all who used them. Digitized readability systems such as Lexiles, ATOS, and DRP changed this transparency. Digitized versions of texts could be analyzed quickly. Further, the contents of texts could be retained in databanks, making it possible to establish the frequency of words in a text relative to all of the words in the vast library of texts represented in a databank (Smith, Stenner, Horabin, & Smith, 1989). The readability levels of tens of thousands of texts could be determined readily, making it unnecessary for users to do hands-on analyses of texts.
Contributions of and Problems with Readability Formulas
Educators, like professionals in other domains, need a variety of diagnostic tools at their disposal. Just as thermometers can give a reading of body heat, readability formulas can give teachers data about a book’s level of complexity. A doctor wouldn’t depend on temperature alone to diagnose an illness, however, and a reading teacher should not depend on a readability score alone to measure text complexity. But like temperature readings, Lexile scores are a good first source of information. The Lexile scores of the following three texts indicate that each is likely to be progressively more difficult than the previous one: Silverman’s (2006) Cowgirl Kate and Cocoa (240L), Simon’s (2006) Volcanoes (930L), and Lincoln’s (1865) Second Inaugural Address (1540L). Readability scores such as Lexiles, ATOS, and DRP are a good beginning to the process, but they cannot do the whole job.
The challenge posed by the use of readability formulas is illustrated in the information about 10 texts that appears in Table 1. According to the guidelines given in the CCSS for grade bands (see Figure 1), the use of readability scores alone would lead to the conclusion that these texts are all appropriate for third graders. In Appendix B of the CCSS where these texts are offered as exemplars, however, six are on the list for the grades 2–3 band, one for grades 4–5, and three for grades 6–8.
The “Staircase”—Text Complexity Grade Bands and Associated Lexile Ranges (in Lexiles)
Source: Common Core State Standards, Appendix A (2010b), p. 8
One feature that can skew simple readability scores has to do with the features of different genres. Take, for example, two titles with similar scores from Table 1: Bat Loves the Night (Davies, 2001) and Roll of Thunder, Hear My Cry (Taylor, 1976). The first book, an informational text intended for young children, gives information about a pipistrelle bat. Roll of Thunder, the 1977 Newbery Award-winner, is a novel that explores issues of land ownership and racism in depression-era Mississippi through the eyes of a young African-American girl named Cassie. Even these cursory summaries demonstrate the vast difference in complexity you can expect from each title. Yet these two texts have the same Lexile score—720.
|Title||Grade Band||Text Type||Lexile|
|Letter on Thomas Jefferson|
|The Stories Julian Tells|
|Where do Polar Bears Live|
|Bat Loves the Night|
|Roll of Thunder|
|From Seed to Plant|
|Travels with Charley|
|So You Want to Be President|
(St. George, 2000)
Critics of readability formulas have long pointed out such genre-based discrepancies. One explanation (and criticism) has been that the short sentences and high-frequency vocabulary used in the dialogue of narratives can artificially skew the readability formula downward. As is typical of narratives, substantial portions of Roll of Thunder consist of dialogue as in the following statement by Papa to Cassie: “It don’t give up. It give up, it’ll die. There’s a lesson to be learned from that little tree, Cassie girl, ‘cause we’re like it.” The vocabulary and syntax of these sentences is not complex but the ideas are.
In contrast, Bat Loves the Night (Davies, 2001) has sentences that are fairly consistent in length and longer than the dialog-heavy sentences found in narratives. However, the sentences convey all the information the reader needs to make meaning. Unlike the narrative, there is nothing to read between the lines.
A second way that genre distorts readability scores is the presence of rare words in informational text. Such as is the case with Bat Loves the Night (Davies, 2001), in which specific but rare words (e.g., roost, batlings) appear repeatedly. Repetitions of these words increase the readability score, but in reality such repetitions may have the opposite effect. Research shows that readers become more facile with vocabulary after several repetitions (Finn, 1978). By simply equating infrequent words with complexity, readability formulas can overestimate complexity of informational texts.
The unequal distribution of frequently and infrequently used words in written English creates further problems with readability systems such as Lexiles, ATOS, and DRP that use large, digital databases. These systems use mathematical formulas to establish the average frequency of the words in a text. The words in written English, however, are distributed in a skewed manner. A set of 4,000 simple word families (e.g., help, helped, helping, helps, helper) accounts for about 90% of all of the words in many texts, regardless of the level or content (Hiebert, 2011). In one database of words from K–12 schoolbooks, 61% of all the words (93,900) account for approximately .5% of the total words in texts (Zeno, Ivens, Millard, & Duvvuri, 1995). Among this latter group of rare words are many of the concrete words that can interest young children, including hippo, peacock, honeybees, and gerbil.
As a result of the many words with similar ratings, more pressure within the readability formulas is on sentence length (Deane, Sheehan, Sabatini, Futagi, & Kostin, 2006). What we do know about syntax runs counter to assumptions made by readability formulas. According to research, shorter sentences do not always make text easier. Short sentences tend to have fewer context clues and fewer links between ideas, requiring the reader to make more inferences (Beck, McKeown, Omanson, & Pople, 1984).
How Can Teachers Use Information From Readability Systems?
One way in which teachers can use information from readability systems is in their recommendations to students for independent reading in the classroom or at home, especially in book selection programs with lists of books based on designated reading levels (e.g., Renaissance Learning, 2012). Texts such as Captain Underpants and the Big, Bad Battle of the Bionic Booger Boy (Pilkey, 2003) may appear on reading lists suitable for fifth graders with a Lexile of 850 and an ATOS grade level of 5.2. Knowing how readability formulas work, however, a teacher is aware that unusual and rare vocabulary (e.g., bionic, naughty) could be a primary reason for this assignment. A teacher might suggest that, instead, students read texts such as Holes (Sachar, 2005), even though having a lower readability level (660 Lexile; 4.6 ATOS), or A Beautiful Game (890 Lexile; 5.7 ATOS; Watt, 2010)—a wonderful collection of stories by world-level soccer players.
But how can the information from readability systems support teachers in designing instruction? And, in designing this instruction, how does information from readability systems fit with information from other systems for establishing text complexity, especially the guided reading levels that are already in place in many schools? These two questions can be answered with a similar response, when teachers recognize the underlying aim of any text complexity system. The gist of the answer to these questions lies in understanding that text complexity systems, whether qualitative or quantitative, all have the same aim—describing features of texts that challenge or support readers in successfully comprehending a text. Quantitative information such as the length of sentences or the frequency of vocabulary describes one set of features. Qualitative analysis, as occurs in guided reading levels, attends to additional features of text such as the content and its connection to readers. The line between quantitative and qualitative analysis is becoming increasingly more blurred as scientists develop ways of describing patterns in discourse that can be captured digitally (see, e.g., Graesser, McNamara, & Kulikowich, 2011). Whatever system of assessing text complexity, the focus is on features that can contribute to the ease or complexity for readers in comprehending text.
To increase students’ capacity with complex text, teachers want as much information as they can get to understand the features of texts that might “grow” their students’ reading and thinking. The overall levels that have been assigned to texts by readability or guided reading systems provide an initial step in establishing the direction for instruction. For example, consider two texts from the Common Core Exemplar list—The Treasure (Shulevitz, 1986) and Tops and Bottoms (Stevens, 2003). The Lexiles for the two texts are the same—650 (the end of second grade, according to the new Lexiles in Figure 1). Guided reading levels based on the Fountas and Pinnell (1999) system are H (end of first grade) for The Treasure and M (end of second grade) for Tops and Bottoms . The discrepancy in ratings for The Treasure suggests the need for a closer examination of the text. But, even on the text where the two systems place the text at a similar point—Tops and Bottoms—the evaluations do not give teachers guidance on what features of the texts may be the source of challenge for their students or the focus of lessons that grow students’ capacity with complex text.
Readability systems that yield a single measure, whether from computer analysis (e.g., Lexiles, ATOS, DRP) or human judgment (e.g., guided reading levels), give an estimation of the range within which a text falls. This information does not indicate the particular features of texts that may challenge readers or provide the growing edge for readers. Fountas and Pinnell (1999) describe Level H texts as having longer and more literary stories than previous levels and less repetition in episodic structure and Level M texts as having even more sophisticated language structures and complex stories. But the dimensions on which complexity of narratives can range are many, including unique text types (e.g., fables with a moral versus trickster tales where conventional norms are broken) and motives of characters (e.g., tricksters) (McAndrews, 2008).
A system called the Text Complexity Multi-Index (Hiebert, 2012b) illustrates how qualitative analyses and analyses of tasks and readers can build on the overall information provided by text complexity systems. As shown in Table 2, the first piece of information comes from text complexity systems like Lexiles and guided reading levels. Information on another quantitative feature—average sentence length—can also be obtained with a Lexile. Tops and Bottoms has shorter sentences, on average, than The Treasure. This information is a signal that there is likely more dialogue in the former than the latter.
|Step||The Treasure||Tops and Bottoms|
|1: Quantitative Indices|
|Guided Reading Level||H||M|
|2: Qualitative Benchmarks [not considered in this analysis]|
|3: Qualitative Dimensions|
|Levels of Meaning||As a fable, story has several levels of meaning—story and its universal theme||Trickster tale has several levels of meaning—explicit & underlying; trickster goes against conventional norms which may be a challenging concept|
|Knowledge demands—content, cultural, literary||Fairly straightforward fable with moral stated explicitly at end||Trickster genre may be new to primary-level students; helpful to know differences in vegetables to anticipate tricks but this knowledge is also given in text|
|Language conventions & clarity||Conventional language of fairy tales (“There once was… “).||Contemporary language with a traditional text structure (e.g., “It’s a done deal!”)|
|Structure||Clear structure of fables||Clear structure of trickster tale|
|4: Reader and Tasks|
|Reader Level||Students who are proficient with highly frequent words and developing capacity with moderately frequent words—likely around 3rd trimester of second grade||Students who are proficient with highly and moderately frequent words—likely end of second grade or beginning of third|
|Social Configuration||Scaffolded silent reading in small groups with teacher||Scaffolded silent reading in small groups with teacher|
|Type of response||Discussion & written response||Discussion & written response|
|Allocation of time||Potentially as many as three small group sessions with independent/peer reading of additional fables & time for written response||Potentially as many as three small group sessions with independent/peer reading of additional trickster tales & time for written response|
MSL = Mean Sentence Length
MLWF = Mean Log Word Frequency
A Lexile analysis also gives an average for word frequency. A lower number indicates more challenging vocabulary. The averages in Table 2 suggest that The Treasure has easier vocabulary than Tops and Bottoms , although an examination of the text is required to interpret the nature of the rare words in the two texts. The conclusion from the quantitative information is that, while having slightly longer sentences, The Treasure has somewhat less taxing vocabulary than Tops and Bottoms.
Background Knowledge and Text Features
The nature of text structure and background knowledge may become increasingly more “quantified” as large-scale systems are developed but, for now, numerous aspects of the overall text require human evaluation. Careful and systematic review of texts can occur with rubrics that describe particular dimensions, much like the evaluation of students’ writing. The aim is not classify a text is simple or complex but to determine what it is that readers need to know to be successful with a text or, if the goal is an instructional one, the opportunities that a text provides for guidance. Further, as the description of dimensions will show, a text is not necessarily “all complex” or “all simple.” For example, a text can be clear in its structure (a fable) but, when the author adds irony or absurdity, the levels of meaning can be highly complex (see, e.g., Pinkwater, 1992).
The TCMI qualitative analysis uses the four dimensions identified within the CCSS—levels of meaning, knowledge demands, language, and structure—but with an increasingly detailed database of how the four features are expressed in texts. The descriptions in Table 2 summarize elements from the reviews of three educators who have or are teaching in the primary grades. This review suggests that, for second graders who know about fables, The Treasure will be fairly straightforward. For those who don’t, the nature of a “moral” may need to be emphasized in a lesson.
The content of Tops and Bottoms follows the structure of a trickster tale closely but, for some second graders—and this may especially be the case with second language learners who are fairly literal in their understanding of English and school—the genre may be a fairly new. Further, for students who don’t know certain vegetables, the trick that the Hare and his family are playing on the Bear may need to be explained.
Readers and Tasks
The teacher now takes the knowledge that has come from the quantitative and qualitative analyses to determine with whom and in what context the texts are appropriately used. Similar to the description of text-level features such as background knowledge and levels of meaning, the features of tasks and contexts—and the way they interact with readers—are many and difficult to quantify. Similar to the rubrics for analyzing text features, however, rubrics are available for describing at least some of the most fundamental features of tasks and contexts. The Reading Space—that appears in Figure 2—is the rubric used in the TCMI process to attend to features of tasks and contexts. With knowledge about their students and the texts, teachers can make informed choices.
|Types of Responses|
|Allocation of Time|
|Fixed, short, immediate|
(e.g., month-long units)
For second graders who have basic proficiency with the requisite core vocabulary, a teacher may decide that The Treasure can be the grist for a lesson on fables. The information gained on Tops and Bottoms might lead a teacher to use the text with students who are quite proficient with highly and moderately frequent words. A short presentation of the vegetables in the story might precede a lesson on tricksters in stories. Unlike the fable, which describes the consequences of human behavior, tricksters disobey normal rules and conventional behavior. The short lesson might focus on the difference between the trickster and the fable (referring to The Treasure which may have been read earlier in the school year).
What the TCMI process shows is that numerous sources of information need to be used for making instructional decisions about the features of texts to emphasize in lessons. Quantitative information is part of the process—information that is likely to get richer and more extensive as linguists and cognitive psychologists become more and more adept at digitized systems. Descriptions of the sort that appear in Table 2 are also needed that give teachers knowledge about the qualitative features of texts as well as recommendations for tasks and contexts. Publishers and teacher leaders can support in providing descriptive information on qualitative and task-context features but, ultimately, teachers are the ones who need to understand how particular text features can influence comprehension. Starting with the guided reading levels and Lexiles, the process of digging into the features of texts will be the means whereby teachers guide their students up the staircase of increasingly complex text.
Beck, I.L., McKeown, M.G., Omanson, R.C., & Pople, M.T. (1984). Improving the comprehensibility of stories: The effects of revisions that improve coherence. Reading Research Quarterly, 19(3), 263–277.
Betts, E.A. (1946). Foundations of reading instruction. New York, NY: American Book Company.
Chall, J.S. (1985). Afterword. In R.C. Anderson, E.H. Hiebert, J.A. Scott, & I.A.G. Wilkinson (Eds.), Becoming a nation of readers (pp. 123–125). Champaign, IL: The Center for the Study of Reading, National Institute of Education, National Academy of Education.
Common Core State Standards Initiative. (2010). Common Core State Standards for English language arts and literacy in history/social studies, science, and technical subjects. Washington, DC: National Governors Association Center for Best Practices and the Council of Chief State School Officers.
Davison, A., & Kantor, R.N. (1982). On the failure of readability formulas to define readable texts: A case study from adaptations. Reading Research Quarterly, 17(2), 187–209.
Deane, P., Sheehan, K.M., Sabatini, J., Futagi, Y., & Kostin, I. (2006) Differences in text structure and its implications for assessment of struggling readers, Scientific Studies of Reading, 10(3), 257–275.
Finn, P.J. (1978). Word frequency, information theory, and cloze performance: A transfer theory of processing in reading. Reading Research Quarterly, 13(4), 508–537.
Fountas, I.C., & Pinnell, G.S. (2009). The Fountas and Pinnell Leveled Book List: K-8+. Portsmouth, NH: Heinemann.
Fry, E.B. (1968). A readability formula that saves time. Journal of Reading, 11(7), 513–516, 575–578.
Graesser, A.C., McNamara, D.S., & Kulikowich, J. (2011). Coh-Metrix: Providing multilevel analyses of text characteristics. Educational Researcher, 40(5), 223–234.
Hiebert, E.H. (2011). The 90-10 rule of vocabulary in increasing students’ capacity for complex text. Retrieved from http://textproject.org/frankly-freddy/the-90-10-rule-of-vocabulary-in-increasing-students-capacity-for-complex-text/
Hiebert, E.H. (2012a). The Common Core State Standards and text complexity. In M. Hougen & S. Smartt (Eds.), Becoming an effective literacy teacher: Integrating research and the new Common Core State Standards (pp. 111–120). Baltimore, MD: Paul Brookes Publishing.
Hiebert, E.H. (2012b). The Text Complexity Multi-Index (Text Matters series). Santa Cruz, CA: TextProject. Retrieved from http://textproject.org/teachers/text-matters/the-text-complexity-multi-index/
Klare, G. (1984). Readability. In P.D. Pearson, R. Barr, M.L. Kamil, & P. Mosenthal (Eds.), Handbook of reading research (Vol. 1, pp. 681–744). New York, NY: Longman.
Koslin, B. L., Zeno, S., & Koslin, S. (1987). The DRP: An effective measure in reading. New York, NY: College Entrance Examination Board.
Lively, B.A., & Pressey, S.L. (1923). A method for measuring the vocabulary burden of textbooks. Educational Administration and Supervision, 9, 389–398.
McAndrews, S.L. (2008). Diagnostic literacy assessments and instructional strategies: A literacy specialists resource. Newark, DE: International Reading Association.
Mesmer, H.A., Cunningham, J.W., & Hiebert, E.H. (2012). Toward a theoretical model of primary-grade text complexity: Learning from the past, anticipating the future. Reading Research Quarterly, 47(3), 235–258.
Milone, M. (2009). The development of ATOS: The Renaissance readability formula. Wisconsin Rapids, WI: Renaissance Learning, Inc.
Nelson, J., Perfetti, C., Liben, D., & Liben, M. (2012). Measures of text difficulty: Testing their predictive value for grade levels and student performance. New York, NY: Student Achievement Partners.
Renaissance Learning (2012). What kids are reading. Wisconsin Rapids, WI: Author.
Smith, D., Stenner, A.J., Horabin, I., & Smith, M. (1989). The Lexile scale in theory and practice (Final report). Washington, DC: MetaMetrics. (ERIC Document Reproduction Service No. ED 307 577)
Spache, G. (1953). A new readability formula for primary-grade reading materials. The Elementary School Journal, 53, 410–413.
Watt, T. (2010). A beautiful game: The world’s greatest players and how soccer changed their lives. New York, NY: HarperCollins.
Zeno, S. M., Ivens, S. H., Millard, R. T., & Duvvuri, R. (1995). The educator’s word frequency guide. New York, NY: Touchstone Applied Science Associates.
Adams, J. (1776/2004). Adams on Adams. Lexington, UK: University Press of Kentucky.
Aliki (1960). A medieval feast. New York, NY: Harper Collins.
Berger, M., (1992). Discovering Mars: The amazing story of the red planet. New York, NY: Scholastic.
Cameron, A., (1981). The stories Julian tells. New York, NY: Random House.
Davies, N. (2001). Bat loves the night. Cambridge, MA: Candlewick.
Gibbons, G. (1993). From seed to plant. New York, NY: Holiday House.
Lincoln, A. (1965). Second inaugural address. Retrieved from http://memory.loc.gov/cgi-bin/query/r?ammem/mal:@field(DOCID+@lit(d4361300))
Pilkey, D. (2003). Captain underpants and the big, bad battle of the Bionic Booger Boy. New York, NY: Scholastic.
Pinkwater, D.M. (1992). Borgel. New York, NY: Aladdin Paperbacks.
Sachar, L. (1999). Holes. New York, NY: Dell Yearling.
Shulevitz, U. (1978). The treasure. New York, NY: Farrar, Straus, and Giroux.
Silverman, E. (2006). Cowgirl Kate and Cocoa. San Anselmo, CA: Sandpiper.
Simon, S. (2006). Volcanoes. New York, NY: HarperCollins.
St. George, J. (2000). So you want to be president? New York, NY: Philomel.
Steinbeck, J. (1962). Travels with Charley: In search of America. New York, NY: Penguin.
Stevens, J. (1995). Tops and bottoms. Orlando, FL: Harcourt.
Taylor, M.D. (1976). Roll of thunder, hear my cry. New York, NY: Phyllis Fogelman Books.
Thomson, S.L. (2010). Where do polar bears live? New York, NY: HarperCollins.