Identifying Principles for the Creation of Texts in A Variety of Languages for Beginning Readers

    by Freddy Hiebert | May 25, 2011

    Elfrieda (Freddy) Hiebert

    TextProject & University of California, Santa Cruz

    At the outset, I want to make it clear that my expertise lies in the texts that facilitate the reading development of a particular group of students in American schools–the students who depend on schools to become literate. In the U.S., we have approximately a third of an age cohort that can be described as “depending on schools to become literate.” The remaining students may learn to read in school but they have at least a modicum of literacy experiences/criterion knowledge when they arrive at school (according to the Early Childhood Longitudinal Study). Unfortunately, our national assessment indicates that we have not been particularly successful in bringing the third of our population that depends on schools to become literate to the levels that are needed in the global-digital economy. Answers, of course, are not simple ones but I would argue that the texts that we have been providing our most vulnerable students have not been as supportive as they can and should be.

    Differences in the orthographies of languages are critical and, while I myself came to English as a second language learner, my work has been with children learning to read in English (including but not limited to children who speak English as a second language). Even so, I believe that there are some principles that can be generalized from our work to that of children in other cultures and with languages that differ substantially from English.

    1. With young children who have not been immersed in print, meaningfulness is critical. There is a small but reputable literature on the role of concreteness in the word learning of children and adults. In my model of TExT (Text Elements by Task), the design of beginning texts involves simultaneous attention to decodability, frequency, and concreteness. The lists of concrete words will differ by culture. But in the TExT model, weight is given to words that have common orthographic patterns AND are concrete. In an American venue, for example, words such as mom, dad, grandma, and grandpa would be evaluated as appropriate (at least if repeated), even though only one of these words–dad–has consistent and regular grapheme-phoneme correspondences. I have identified a list of 1,000 concrete words for children in the U.S. that are part of the analysis of what makes a set of texts appropriate in the TExT analysis. This list would NOT be appropriate in particular parts of Africa or India. HOWEVER: the construct is applicable. Children new to literacy, as Sylvia Ashton-Warner argued after working with the Maori children in New Zealand 70 years ago, need to know from the get-go that written language is about meaning. Within the TExT model, concreteness is an early scaffold. It is not the picture-text match of the Reading Recovery/Guided Reading perspective. Concrete words need to be used frequently and a core group of them should have patterns that support decoding. And the weight of this factor on the model is gradually released.
    2. It is also critical to give weight to words that are highly frequent in a language. Zipf’s (1935) law appears to apply to many languages beyond the European languages that he orginally studied. Tian (2006) has stated that experiments prove that the word frequency distribution in Chinese complies with Zipf’s law. I have yet to locate data on African languages but it appears that a small group of words typically accounts for many of the words in a language. Hopefully (since the orthographies for some of these languages are newer than that of English), some of the languages in African cultures will not have the idiosyncratic high-frequency words that English does. I don’t know this but I suspect that this may be the case. Words that are highly frequent, highly decodable, and highly concrete are the ideal. But there are function words that won’t be highly concrete (although highly frequent and, hopefully, in languages other than English, highly decodable). Weight needs to be given to words that are part of highly populated “word” neighborhoods.

    While orthographies may not have the strange history that English does–and thus, not the erratic orthography–I want to caution against too much nonsensical text for children who are new to text and are the children of poverty. The U.S. has a genre I call “extreme decodables.” The texts contain many of the archaic Anglo-Saxon words that are rarely used in conversation or even text (e.g., vex, wrench, tack). Beginning readers need substantial and consistent data about the code. At the same time, we need to remember why we are doing it (meaning) and function (frequency). Children of poverty are likely to treat school tasks seriously and without the humor that otherwise characterizes their lives. School is a “serious” and literal place. Texts that are silly may not be an appropriate point of departure. They have NOT proven to be so with American children who enter schools with languages and cultures other than those of the mainstream.


    Tian, X. (2006). Statistic analysis of the papers published in Chinese journals of computers. Journal of Library and Information Sciences in Agriculture. doi: 1002-1248.0.2006.003-050

    Zipf, G. K. (1935). The Psychobiology of Language. Boston, MA: Houghton-Mifflin.