The 90-10 Rule of Vocabulary in Increasing Students’ Capacity for Complex Text

Posted by Freddy Hiebert on 7 June 2011

Elfrieda (Freddy) Hiebert

TextProject & University of California, Santa Cruz

The English language has an incredibly rich vocabulary, and yet we use only about 2% of it in the bulk of our typical written texts. This core vocabulary accounts for about 90% of our narrative texts (including literary texts) and at least 85% of our informational texts (including scientific and technical texts).

The Teacher’s Word Book first brought this disparity to the attention of educators (Thorndike, 1921). Yet, there is an additional disparity even within the core vocabulary. The 25 most frequent words (e.g., the, of, to, a) alone account for 33% of all the words in typical texts. This realization led to the creation of Dick-and-Jane-style readers (Gray, Baruch, & Montgomery, 1940). However, many of these most frequently used words are functions words, the glue that holds our thoughts together.

Figure 1. Distributions of the Core and Extended Vocabularies

Beyond the first 100-200 basic words, the core vocabulary is stocked with general concept words--words such as mysteries, property, and interior. These words are highly versatile--many of them are polysemous, or multiple-meaning words. Many of them can also function as different parts of speech. Approximately 4,000 root words in this core group form approximately 5,600 unique words (Zeno, Ivens, Millard, & Duvvuri, 1995). When simple endings are added to these words (inflected endings, possessives, plurals, ly, y, er, est), their numbers approach 9,000 words (see Figure 1).

Words outside the core vocabulary--words as such as zebra, zipper, and dirigible--make up the remaining 10% of narrative texts and 15% of informational texts. Approximately 300,000 to 600,000 words belong to the pool of infrequent words, and I refer to them as the extended vocabulary. The English language is a trove of these rare words because the vast majority of our vocabulary (98%) is infrequently used.

Extended vocabulary words tend to stand out in texts because they add specificity and because they are beloved by people who value the richness of the English language. These words also make up the grist of the Common Core State Standards (CCSS; CCSS, 2010) with regard to content area and literary vocabulary. So attention will continue to be paid to them. But as educators, a sole focus there may be misplaced. A student who learns dirigible as part of a vocabulary lesson may not encounter it for years, if ever again. Words such as chief, condition, and resource, however, can be expected to appear frequently in many different subject areas and with a variety of meanings.

The extended vocabulary needs to be a focus of elementary classrooms, too. Such words are the essence of literary and content-area instruction. Lessons attend to what it means for a character to be enigmatic and how this trait may influence the outcome of the story. Inquiry into the meaning of terms such as radiation and convection drives science instruction.

But, in addition to strategic and intensive instruction that develops extended vocabulary1, an elementary program also needs to ensure that students are facile with the core vocabulary that forms the foundation of text. For decades the general rule of thumb in reading pedagogy has been that a fundamental grasp of approximately 90% of the words in a text is needed for the reading experience to be meaningful for students. If students can understand 90%, they can figure out the other 10% without a breakdown in meaning. This makes a strong argument for using valuable classroom time on the heavy-lifting words in our core vocabulary.

In fact, students need to be facile with the core vocabulary by the end of fourth grade. This is the point at which students begin reading to learn. Those who cannot read well enough to do so are quickly left behind. Those who have learned the root words in the core vocabulary can use them to unlock 85%- 90% of all written text. Because the core vocabulary is based on essential concepts, knowledge of these words is also tied to content knowledge. So there is a large advantage beyond reading to be gained.

Table 1 illustrates how bands of core words can systematically be emphasized over second through fourth grade2. Extensive reading of accessible texts helps in building a strong foundation, in addition to lessons that attend to the shared meaning across a morphological family (e.g., develop, developing, developed) and the unique meanings of core words with multiple meanings (e.g., force, energy).

Despite the lure of attending exclusively to infrequently used words because of their specificity and richness, we must remember what is at stake. In the face of demands for higher reading levels from the CCSS (CCSS, 2010), we must be clear about what it means to be a proficient reader. Proficient readers need to apprehend 90% of the text. This is attainable when students are facile with the meanings of this highly versatile group of 4,000 words that make up the core vocabulary. Let’s work to give all students this important foundation before dressing their vocabularies up in the frippery of rare and specialized terminology.



Table 1.  Creating the Foundation with the Core Vocabulary: Grades 2-4
WordZone3 Associated Grade Level Predicted occurance per million running words of text Number of words in WordZone (or portion thereof) (#) Number of words with morphological relatives (simple endings & words with frequencies of 1 or more per million) (#) Examples
68,000 - 1,000 107 177 do, then, which
2 99 - 300 203 478 example, word, united
3.1 2 299 - 200 143 316 public possible, surface
3.2 199 - 100 477 1022 develop, service, necessary
4.1 3 99 - 65 439 844 determine, influence, evidence
4.2 64 - 44 553 924 function, standard, quality
4.3 43 - 30 685 1059 conflict, internal, maintain
5.1 4 29 - 20 936 1296 severe, confidence, resistance
5.2 19 - 14 974 1272 tendency, accomany, recommend
5.3 13 - 10 1070 1212 precede, adjustment, component



1 Strategic instruction of the extended vocabulary takes different forms with the words prominent in narratives and those in informational texts. These different treatments for these words have been described elsewhere (Hiebert & Cervetti, 2011; Hiebert, 2011).]

2 The words of the first two zones include many function words that have small morphological families and are best learned through extensive reading.

3 This version of the WordZones represents a modification of the original presentation in Hiebert (2005).  The first wordzone is now 1, rather than 0, which affects the numbering of all subsequent zones.