Invited Commentary: Moving Toward Better Instructional Practice Data

View Quarterly by: This Issue | Volume and Issue | Topics

Vol 1, Issue 2, Topic: Featured Topic: Instructional Practices

By: Daniel P. Mayer, Researcher, Mathematica Policy Research, Inc., Washington, DC

This commentary represents the opinions of the author and does not necessarily reflect the views of the National Center for Education Statistics.

The Need for Teaching Practice Data
How Accurate Are Surveys?
NCES Is Developing Better Measures of Teaching Practice
Conclusion
References

The Need for Teaching Practice Data

What Happens in Classrooms? Instructional Practices in Elementary and Secondary Schools: 1994-95 is a timely response to policymakers' increasing interest in improving education by reforming teaching practices or strategies (Blank and Pechman 1995). Measuring teaching practices using survey data, however, is still in its "infancy" (Brewer and Stasz 1996). To date, there have been very few studies that have used teacher surveys to describe the instructional strategies used throughout the country and, of these, none provides as detailed information as does What Happens in Classrooms? This is due to the fact that, historically, education reforms have tinkered at the edges of the educational process (Marshall, Fuhrman, and O'Day 1994, 12). Even the extensive reform efforts of the 1970s and 1980s remained aloof from teaching practices. During those decades, policymakers tried to improve schooling by adjusting resource allocations (e.g., striving for racial balance and financial equity) and by setting outcome goals (e.g., setting minimum course requirements and implementing minimum competency tests). Arguably, the perceived inadequacies of these policies have led to the country's current enthusiasm for educational standards aimed at influencing teaching practices.

To monitor the impact of these unprecedented reform efforts, the country needs accurate and nationally representative teaching practice data. The push for the routine collection of nationally representative data of this type only began in the late 1980s (e.g., Murnane and Raizen 1988; Office of Educational Research and Improvement 1988; Porter 1991; Shavelson et al. 1987). But a perceived inability of surveys to measure instructional practices, combined with policymakers' and researchers' historical emphasis on input-output studies, helps explain why much of what the country currently knows about the instructional process comes from in-depth studies in a handful of classrooms. A major limitation of in-depth studies is that their generalizability to other classrooms is unknown. Unfortunately, as reform initiatives increasingly focus on instructional processes, demand for accurate instructional practice data will remain high and the generalizability limitations of in-depth studies will become increasingly problematic. In turn, surveys will grow in appeal since they are a cost-effective way to include large numbers of classrooms in studies.

Alternative study models that straddle these two approaches for gathering teacher practice data are being tried. The Third International Mathematics and Science Study (TIMSS) supplemented teacher surveys with a "video survey" of 231 eighth-grade math classrooms in three countries. The video survey, like classroom observations, promises objectivity and specificity and has the added advantage of being available for wider and more systematic scrutiny. The TIMSS approach does not, however, surmount the primary hurdle associated with conducting classroom observations, namely, cost. Regularly conducting video surveys in a nationally representative sample of classrooms of different grade levels and subject areas would undoubtedly be cost prohibitive. Consequently, teacher self-reports of the sort collected in large national surveys such as the 1994-95 Teacher Follow-up Survey (TFS:94-95)—the data source for the findings reported in What Happens in Classrooms?—remain the most viable means for obtaining information about the status of teaching practices in the United States.

The TFS:94-95 findings reported in What Happens in Classrooms? are unique because they provide national estimates¹ of the proportion of teachers from all grade levels and major subject areas (English, mathematics, history, and science) who use various teaching strategies. Using data that are slightly dated but are unfortunately the most recent available, it examines the degree to which teaching practices vary by grade level and subject area; how instructional approaches vary with the characteristics of teachers, students, and schools; and the degree to which teachers use the reform instructional approaches advocated by the National Board for Professional Teaching Standards and several voluntary national curriculum standards.

The report presents some surprising findings. For example, one would expect that because older students have more knowledge and skills, the teachers of these students would tend to put more emphasis on higher order thinking skills than the teachers of younger students. But What Happens in Classrooms? finds that, in several instances, the opposite is the case. Also surprisingly, while several other studies (Mayer 1998; Metz 1978; Oakes 1985; Raudenbush, Rowan, and Cheong 1993) have found that teachers of high-achieving students are more likely to use reform teaching practices (those emphasizing application, reasoning, and conceptual understanding) than traditional practices (those emphasizing memorization of facts and the mastery of routine skills), What Happens in Classrooms? finds that, in many instances, the opposite is true.

While the country needs information of the sort gathered by the TFS:94-95 and presented in What Happens in Classrooms?, many educators and researchers are skeptical about the ability of surveys to truly capture what goes on in classrooms. And given that national data collection efforts that use teacher surveys to describe teaching practices are in their infancy, researchers and policymakers want to know how much faith they can have in this type of data.

How Accurate Are Surveys?

Studies that have investigated the reliability and validity of using surveys to gather information on teaching practices have produced both encouraging and discouraging findings. The reliability of a survey describes whether its use in repeated trials will yield the same results. Low reliability could be the result of teachers finding the questions difficult to interpret or inaccurately recalling what they do in their classrooms. But knowing that an instrument is reliable does not justify the assumption that it is valid. Validity describes the extent to which an instrument accurately measures the phenomena of interest. One of the chief concerns about teaching practice survey data is that they may not provide an accurate depiction of what goes on in classrooms, for several possible reasons: (1) the teaching process consists of complex interactions between students and teachers that a survey cannot accurately depict, (2) teachers provide biased responses to a survey because they feel that they should (for a variety of reasons) respond to the questions in an "acceptable" or "socially desirable" way, and (3) teachers unknowingly provide misleading responses to the survey questions. Research suggests that teachers sometimes truly believe they are embracing pedagogical reforms, but in practice their teaching comes nowhere near the vision of the reformers (Cohen 1990).

To date, efforts to evaluate the reliability of the TFS items on teaching practices have raised questions but not resolved them. The National Center for Education Statistics (NCES) contracted with the U.S. Census Bureau to examine the reliability of selected TFS:94-95 survey questions. Twenty-two of the teaching practice questions used in the analyses in What Happens in Classrooms? were included in this study, and the reliability of all 22 was found to be "problematic" (Henke, Chen, and Goldman 1999). Though the analyses used in the report try to account for this, the authors note that the findings should be interpreted with caution. Using a much smaller sample, but similar survey questions, I conducted an exploratory study that also found the items to be unreliable (Mayer 1999). On the other hand, Smithson and Porter (1994) and Burstein et al. (1995) conducted studies that led them to conclude that these types of instructional practice questions can be quite reliable.²

In my study, I did find that when variables representing similar pedagogical philosophies were grouped together to give a portrait of the preferred pedagogical style of teachers, the reliability of that composite variable was quite high ³(and was, in this case, significantly related to middle school algebra learning) (Mayer 1998). Combining items makes sense because a single item cannot "provide a coherent picture of instruction" (Burstein et al. 1995, 36). Other composites, such as academic aptitude test scores and the Consumer Price Index (CPI), provide a good analogy. Aptitude tests always consist of multiple questions that measure an underlying characteristic, such as mathematics ability. Likewise, the CPI, which tracks inflation, is created by monitoring the cost of a "basket" of goods that consumers might purchase in a given month. Tracking the cost of only one product, such as canned soup, would not provide an accurate or informative picture of inflation. And answering one algebra question would not provide an accurate measure of mathematics aptitude. In What Happens in Classrooms?, interesting summary variables were created, but unfortunately they were discussed only briefly, and the relationships between these variables and other variables were not presented in the report.

The validity of teaching practice items has also been investigated and resulted in similarly mixed conclusions. Burstein et al. (1995, 45) compare classroom artifacts (i.e., textbooks and assignments) with teacher survey responses concerning the characteristics of their exams and homework assignments. They conclude: "To the extent that we were able to validate the survey data on teachers' instructional strategies, we found that those data report accurately the instructional strategies used most often by teachers…." In another study (Mayer 1999), a composite representing the amount of time spent using reform mathematics teaching practices based on survey data and a parallel composite based on classroom observations also produced a high correlation ( r = .85). Despite this encouraging finding, the same classroom observations also revealed that the survey did not adequately capture the quality of the teachers' use of various practices.

NCES Is Developing Better Measures of Teaching Practice

Studies that have investigated the reliability and validity of using surveys to gather information on teaching practices suggest important ways in which this effort can be improved. The teaching practice items on the upcoming 1999-2000 Schools and Staffing Survey (SASS:1999-2000) will reflect some of these strategies. For example, on the TFS:94-95, teachers were asked to describe their teaching over the past semester, but research by Mullens and Gayler (1999) suggests that teachers cannot accurately recall the whole semester. Therefore, SASS:1999-2000 will ask teachers to refer to their last 2 weeks of typical instruction when describing their teaching practices.

On the TFS:94-95, teachers were also asked to report whether they used teaching practices "almost every day," "once or twice a week," "once or twice a month," "once or twice a semester," or "never." These response options are limited in at least two ways. First, Burstein et al. (1995) found that because "almost every day" and "once or twice a week" were such similar response options, teachers could not distinguish between them, thereby reducing their reliability. Second, because these response options only ask teachers to assess how often they use particular teaching approaches and not how much time they spend on each approach, the results can be uninformative and misleading. For example, What Happens in Classrooms? reports that at least 85 percent of teachers stated they used numerous practices at least once a week (e.g., working in small groups, providing whole group instruction, and having students answer open-ended questions), but this result inappropriately lumps together teachers who use a given approach for only a few minutes a week with those who use it for several hours. As a remedy to these problems, SASS:1999-2000 will ask teachers to estimate how often and for how many minutes they use each of the instructional techniques over a 2-week period.

In addition to the improvements that will likely result from the new SASS items, NCES is sponsoring a 4-year research and development effort through the Education Statistics Services Institute (ESSI) aimed explicitly at creating more accurate teaching practice indicators.

Conclusion

The TFS:94-95 findings reported in What Happens in Classrooms? provide important information about the instructional practices being used throughout the country, but they also offer an opportunity to further our understanding of how to use surveys to measure instructional practice. Carefully used, surveys offer the most cost-efficient means to measure instructional practice. To move instructional practice surveys into the next stage of development, NCES has been refining the teaching practice measures used on its surveys. The fruits of this labor should help policymakers and reformers as they attempt to assess the degree to which new policies aimed at influencing teaching practices are taking hold and having their desired effect.

Footnotes

¹ The TFS:94-95 is not representative of the entire 1994-95 teacher population because teachers were not eligible for the TFS sample unless they had been teaching in 1993-94. Therefore, it excludes 1994-95 first-year teachers and experienced teachers who were not teaching in 1993-94 but returned to the teaching force in 1994-95.

² I do not think their findings are as encouraging as they do. For a discussion of why, see Mayer (1999).

³ This is not unexpected given that when multiple items measure the same underlying characteristic (e.g., a reform instructional approach) and are grouped together, the reliability of the construct will always be greater than the reliability of the individual items (Carmines and Zeller 1979).

References

Blank, R., and Pechman, E. (1995). State Curriculum Frameworks in Mathematics and Science: How Are They Changing Across the States? Washington, DC: Council of Chief State School Officers.

Brewer, D., and Stasz, C. (1996). Enhancing Opportunity to Learn Measures in NCES Data. In From Data to Information: New Directions for the National Center for Education Statistics (NCES 96-901) (pp. 3-1-3-28). U.S. Department of Education. Washington, DC: U.S. Government Printing Office.

Burstein, L., McDonnell, L., Van Winkle, J., Ormseth, T., Mirocha, J., and Guitton, G. (1995). Validating National Curriculum Indicators. Santa Monica, CA: The RAND Corporation.

Carmines, E.G., and Zeller, R.A. (1979). Reliability and Validity Assessment. Newbury Park, CA: Sage Publications.

Cohen, D. (1990). A Revolution in One Classroom: The Case of Mrs. Oublier. Educational Evaluation and Policy Analysis 14: 327-345.

Henke, R.R., Chen, X., and Goldman, G. (1999). What Happens in Classrooms? Instructional Practices in Elementary and Secondary Schools: 1994-95 (NCES 1999-348). U.S. Department of Education. Washington, DC: U.S. Government Printing Office.

Marshall, S., Fuhrman, S., and O'Day, J. (1994). National Curriculum Standards: Are They Desirable and Feasible? In R. Elmore and S. Fuhrman (Eds.), The Governance of Curriculum: 1994 Yearbook of the Association for Supervision and Curriculum Development.Alexandria, VA: Association for Supervision and Curriculum Development.

Mayer, D. (1998). Do New Teaching Standards Undermine Performance on Old Tests? Educational Evaluation and Policy Analysis 20(2): 53-73.

Mayer, D. (1999). Measuring Instructional Practice: Can Policy Makers Trust Survey Data? Educational Evaluation and Policy Analysis 21(1): 29-45.

Metz, M. (1978). Classrooms and Corridors: The Crisis of Authority in Desegregated Secondary Schools. Berkeley, CA: University of California Press.

Mullens, J., and Gayler, K. (1999). Measuring Classroom Instructional Processes: Using Survey and Case Study Field Test Results to Improve Item Construction (NCES 1999-08) . U.S. Department of Education. Washington, DC: NCES Working Paper.

Murnane, R., and Raizen, S. (1988). Improving Indicators of the Quality of Science and Mathematics Education in Grades K-12. Washington, DC: National Academy Press.

Oakes, J. (1985). Keeping Track: How Schools Structure Inequality. New Haven, CT: Yale University Press.

Office of Educational Research and Improvement. (1988). Creating Responsible and Responsive Accountability Systems. U.S. Department of Education. Washington, DC: U.S. Government Printing Office.

Porter, A. (1991). Creating a System of School Process Indicators. Educational Evaluation and Policy Analysis 13(1): 13-29.

Raudenbush, S., Rowan, B., and Cheong, Y. (1993). Higher Order Instructional Goals in Secondary Schools: Class, Teacher, and School Influences. American Educational Research Journal 30: 523-553.

Shavelson, R., McDonnell, L., Oakes, J., Carey, N., and Picus, L. (1987). Indicator Systems for Monitoring Mathematics and Science Education. Santa Monica, CA: The RAND Corporation.

Smithson, J., and Porter, A. (1994). Measuring Classroom Practice: Lessons Learned From the Efforts to Describe the Enacted Curriculum—The Reform Up-Close Study. Madison, WI: Consortium for Policy Research in Education.