Looking “Within” the Lexile for More Guidance: Word Frequency and Sentence Length

    by Freddy Hiebert | January 24, 2011

    Freddy Hiebert
    TextProject, Inc.

    Standard 10 defines a grade-by- grade “staircase” of increasing text complexity that rises from beginning reading to the college and career readiness level. Whatever they are reading, students must also show a steadily growing ability to discern more from and make fuller use of text, including making an increasing number of connections among ideas and between texts, considering a wider range of textual evidence, and becoming more sensitive to inconsistencies, ambiguities, and poor reasoning in texts. (Common Core State Standards Initiative, 2010, p. 8).

    A standard that emphasizes capacity with increasingly more complex text is a first in a national or state standards document.  Text complexity, according to the CCSS/ELA is a function of three factors:  qualitative (e.g., levels of meaning, structure, knowledge demands), quantitative (e.g., readability measure and other scores of text complexity, and matching reader to text and task (e.g., reader variables such as motivation, knowledge, and experiences; task variables such as purpose and questions).  Of the measures that the CCS proposes for establishing text complexity, only data on one type of quantitative measure—lexiles—is explicitly presented and easily obtained.

    Similar to the readability formulas that have been used in American schools for almost a century, the lexile of a text is established through an algorithm that considers sentence length and word frequency.   The computation produces a lexile that can be placed on a scale which spans 0 (easiest texts) to 2000 (most complex texts).  A single number is typically presented as the lexile for an entire text—including a full-length text.  For example, the lexiles for a well-loved children’s book, Sarah:  Plain and Tall is 430, while that of Stories Julian Tells is 700.  As a single number, a lexile gives a general indicator of difficulty.  Green Eggs and Ham has a lexile of 30, while that of Pride and is 1030L.  These texts fall a general direction that makes sense to most educators acquainted with these texts.  Green Eggs and Ham is easy; Sarah:  Plain and Tall somewhat harder; and Pride and Prejudice is the most complex of the three. 

    When an individual text is examined for purposes of instruction and independent reading, particular features of a text can make the lexile difficult to predict.  For example, Harry Potter and the Chamber of Secrets and Old Man and the Sea have the same lexile:  940.  While the Rowlings book is by no means a simple one, it has a style and content that likely make it more palatable to a fifth grader than the Heminway text. 

    Information on sentence length and word frequency gives more specific information for the lexile rating. Often, the lexiles of texts vary considerably because of big differences in the lengths of sentences.  When authors use complex sentence structure, students’ comprehension can be affected.  But, sometimes, authors have a style where they use the word “and” to join ideas.  That’s the case with example 1 in Table 1.  That text and the one in Example 2 have the same average word rating—3.8. But the first text has an average of 3.5 words more per sentence than the second text.  The difference in sentence length affects the Lexile:  700 for The Stories Julian Tells and 430 for Sarah, Plain and Tall.

    Short sentences do not necessarily make a text easy to read.  In the text segment from Sarah:  Plain and Tall, Caleb keeps begging his older sister to retell the story of his birth (followed by their mother’s death).  The text is more complex conceptually than the description of what Julian and his brother have chosen to plan in their garden.

    It is the average word frequency that is even more critical to consider than the average sentence length.  A low average word frequency means that the text likely has a number of words that many students may not have seen in the past. Teachers should especially be aware of big differences in the average word frequencies.  The One-Eyed Giant (example 3) has a lexile of 680 but it has a word frequency score of 3.47.  Sentences are about the same length as The Stories Julian Tells but there are more infrequent words.  Vocabulary such as Cyclopes, savage, and devour will likely make The One-Eyed Giant more challenging for third graders than The Stories that Julian Tells

    Children’s reading performances are heavily influenced by the vocabulary in a text.  Typical word frequency ranges for different grades are given in Table 2.  When word frequency averages are substantially lower than typical grade ranges, teachers should know that students might need some extra vocabulary support.  

    And, always remember:  There are big differences in the styles and vocabulary of stories (narratives) and informational texts (content-area texts).  Readability formulas like lexiles often underestimate the difficulty of stories and overestimate the difficulty of informational texts.  Why is that?  In stories, authors often use dialogue.  Typically statements in conversations are short.  Short sentences lend themselves to lower lexiles.

    In informational texts, authors often use fairly infrequent words (e.g., degrees, frigid, Arctic, blubber in a text on polar bears).  Infrequent words have lower ratings than the more frequent words that are found in stories and these words are repeated often in an informational text.  But the repetition of the infrequent words can be an aid to comprehension.  Further, the words in an informational text usually relate to a theme that also can make words easier to comprehend. 

    When the average for sentence length is substantially beyond the typical range, teachers should check the author’s style.  Usually, long sentences won’t be much of a problem in stories.  However, long sentences that have important ideas in phrases or clauses can be a problem for students when they are reading content-area texts. 

    Teachers should use the lexile rating as an initial piece of information, much like a check of someone’s temperature.   A temperature can be high or low for lots of different reasons.  The average sentence length and average word frequency gives teachers more specific information that is useful for decision-making.


    Common Core State Standards Initiative (2010).  Common Core State Standards for English Language Arts & Literacy in History/Social Studies, Science, and Technical Subjects.Washington, DC: CCSSO & National Governors Association.

    Table 1. Examples of Texts & General and Specific Lexile Information
     ExampleSource & AuthorGeneral LexileSpecific Lexile Information
    Average Sentence LengthAverage Word Frequency


    My father said he wasn’t sure he wanted either giant corn or a flower house, and if we wanted them, we would have to take care of them all summer by pulling weeds.

    The Stories Julian Tells, Ann Cameron

    700 Lexile




    “Every-single-day,” I told him for the second time this week. For the twentieth time this month. The hundredth time this year? And the past few years?

    Sarah—Plain and Tall, Patricia MacLachlan

    430 Lexile




    He was the most savage of all the Cyclopes, a race of fierce one-eyed giants who lived without laws or leader. The Cyclopes were ruthless creatures who were known to capture and devour any sailors who happened near their shores.

    The One-eyed Giant, Mary Pope Osborne

    680 Lexile



    Table 2. Typical Averages for Word Frequency and for Sentence Length1
    Gr. BandNarrative TextsInformational Texts
     Word FrequencySentence LengthWord FrequencySentence Length





















    1 Based on an analysis of the exemplars presented in Appendix B of the Common Core State Standards by Elfrieda H. Hiebert (December 9, 2010). The view of text complexity within the Common Core Standards: What does it mean for struggling Readers? Plenary address at the annual conference of the American Reading Forum, Sanibel, FL.

    This post was edited on February 8, 2011.