Over the last century educational and psychological measurement have grown into an international discipline with applications in educational achievement and aptitude testing, personnel evaluation, and psychological testing. Since the mid-1960s "classical" test theory and the accompanying methodologies have been overtaken in practice by "modern" approaches such as item response theory. The modern approaches take advantage of computing resources that have become increasingly available over the past few decades.
The seven articles in this theme issue examine the success of both classical and modern approaches in solving measurement problems. They were selected from papers presented at a conference, Measurement for the Social Sciences: Classical Insights into Modern Approaches, held in December 2002 in Toronto.
The first article proposes a theory of measurement. Roderick McDonald argues that measurement of educational or psychological constructs should be thought of in relation to a domain of behaviors or "items." He explores both how psychometric indices can be constructed using this framework and also the practical implications for test construction.
The five articles that follow examine specific problems in measurement. For example, how should we find items that are differentially difficult for groups of students? Randall Penfield investigates this problem, comparing the Breslow- Day test of trend in odds ratio heterogeneity and the Mantel-Haenszel chi-square approaches. He concludes that the most accurate identification of items that exhibit differential item functioning (DIF) results from using decision rules based on both approaches.
Shizuhiko Nishisato asks whether we are extracting as much information as possible from multivariate data and proposes a different way of thinking of the information available in data based on the Dual Scaling approach. The argument he presents has implications for analyses of data from educational and psychological tests, as well as data from other sources.
Multilevel modeling is increasingly popular for analyzing data from educational settings. Richard Wolfe and Jennifer Dunn suggest that the estimates produced by multilevel modeling could be improved by applying the jackknife technique. As they illustrate in the second of two studies, this approach may also be useful in analyzing test items.
Test items may become easier or more difficult over time, particularly as school curricula and teaching practices change. What implications does this have for analyses of test data? André Rupp and Bruno Zumbo examine the effects of item-parameter drift when various item response theory models are applied to examinees' responses.
Student motivation can also affect models of test data. Comparing models of low examinee motivation and its effect on estimates of item difficulty, discrimination, and a pseudo-guessing parameter, Christina van Barneveld considers the possible effect of biased estimates, particularly on item selection in computerized adaptive testing.
The final article addresses test development. Todd Rogers, Mark Gierl, Claudette Tardif, Jie Lin, and Christina Rinaldi tackle the problem of developing equivalent tests in two languages, in this case English and French.
The conference for which
these papers were written was held in honor of Ross Traub on his
retirement from the Measurement and Evaluation faculty at the
Ontario Institute for Studies in Education. Ross has contributed
to and critiqued the changes in measurement methodology over almost
40 years, stimulating the thinking of many colleagues and students
along the way. As Ronald Hambleton, one of Ross's first students
and now a distinguished professor at the University of Massachusetts
at Amherst comments, "Professor Ross Traub was there at the
Educational Testing Service in Princeton when the first thoughts
about transitioning from classical to modern measurement were
taking shape in the early 1960s, and he has contributed to and
monitored the transition throughout his career." Bruno Zumbo
captures the spirit with which this issue was prepared when he
writes of his contribution with André Rupp, "As is
fitting for this theme issue in honor of Ross Traub's contributions
to measurement theory, this work was inspired by, and is a tribute
to, the scholarly tradition fostered by him and others who have
both shone a spotlight on the selection of models that are robust
and faithful to the construct under study and who have critically
examined the practice of and potential biases in model selection."
This issue is a small sample of the work Ross's teaching and writings
have inspired.
Ruth A. Childs
Copyright © AJER, the Faculty of Education, and the University
of Alberta, 2003.
Last revised: November 17, 2003.
Designed by G.H. Buck