Introduction

 

Measurement for the Social Sciences: Classical Insights into Modern Approaches

 

Over the last century educational and psychological measurement have grown into an international discipline with applications in educational achievement and aptitude testing, personnel evaluation, and psychological testing. Since the mid-1960s "classical" test theory and the accompanying methodologies have been overtaken in practice by "modern" approaches such as item response theory. The modern approaches take advantage of computing resources that have become increasingly available over the past few decades.

The seven articles in this theme issue examine the success of both classical and modern approaches in solving measurement problems. They were selected from papers presented at a conference, Measurement for the Social Sciences: Classical Insights into Modern Approaches, held in December 2002 in Toronto.

The first article proposes a theory of measurement. Roderick McDonald argues that measurement of educational or psychological constructs should be thought of in relation to a domain of behaviors or "items." He explores both how psychometric indices can be constructed using this framework and also the practical implications for test construction.

The five articles that follow examine specific problems in measurement. For example, how should we find items that are differentially difficult for groups of students? Randall Penfield investigates this problem, comparing the Breslow- Day test of trend in odds ratio heterogeneity and the Mantel-Haenszel chi-square approaches. He concludes that the most accurate identification of items that exhibit differential item functioning (DIF) results from using decision rules based on both approaches.

Shizuhiko Nishisato asks whether we are extracting as much information as possible from multivariate data and proposes a different way of thinking of the information available in data based on the Dual Scaling approach. The argument he presents has implications for analyses of data from educational and psychological tests, as well as data from other sources.

Multilevel modeling is increasingly popular for analyzing data from educational settings. Richard Wolfe and Jennifer Dunn suggest that the estimates produced by multilevel modeling could be improved by applying the jackknife technique. As they illustrate in the second of two studies, this approach may also be useful in analyzing test items.

Test items may become easier or more difficult over time, particularly as school curricula and teaching practices change. What implications does this have for analyses of test data? André Rupp and Bruno Zumbo examine the effects of item-parameter drift when various item response theory models are applied to examinees' responses.

Student motivation can also affect models of test data. Comparing models of low examinee motivation and its effect on estimates of item difficulty, discrimination, and a pseudo-guessing parameter, Christina van Barneveld considers the possible effect of biased estimates, particularly on item selection in computerized adaptive testing.

The final article addresses test development. Todd Rogers, Mark Gierl, Claudette Tardif, Jie Lin, and Christina Rinaldi tackle the problem of developing equivalent tests in two languages, in this case English and French.

The conference for which these papers were written was held in honor of Ross Traub on his retirement from the Measurement and Evaluation faculty at the Ontario Institute for Studies in Education. Ross has contributed to and critiqued the changes in measurement methodology over almost 40 years, stimulating the thinking of many colleagues and students along the way. As Ronald Hambleton, one of Ross's first students and now a distinguished professor at the University of Massachusetts at Amherst comments, "Professor Ross Traub was there at the Educational Testing Service in Princeton when the first thoughts about transitioning from classical to modern measurement were taking shape in the early 1960s, and he has contributed to and monitored the transition throughout his career." Bruno Zumbo captures the spirit with which this issue was prepared when he writes of his contribution with André Rupp, "As is fitting for this theme issue in honor of Ross Traub's contributions to measurement theory, this work was inspired by, and is a tribute to, the scholarly tradition fostered by him and others who have both shone a spotlight on the selection of models that are robust and faithful to the construct under study and who have critically examined the practice of and potential biases in model selection." This issue is a small sample of the work Ross's teaching and writings have inspired.

Ruth A. Childs




Copyright © AJER, the Faculty of Education, and the University of Alberta, 2003.
Last revised
: November 17, 2003.

Designed by G.H. Buck