A. Gelman, D. Rubin (1992)
Inference from Iterative Simulation Using Multiple SequencesStatistical Science, 7
F. Lord (1952)
A theory of test scores.
H. Wainer (2000)
Computerized Adaptive Testing: A Primer
A. Gelfand, Adrian Smith (1990)
Sampling-Based Approaches to Calculating Marginal DensitiesJournal of the American Statistical Association, 85
Christine DeMars (2010)
Item Response TheoryAssessing Measurement Invariance for Applied Research
H. Wainer (1995)
Precision and Differential Item Functioning on a Testlet-Based Test: The 1991 Law School Admissions Test as an ExampleApplied Measurement in Education, 8
Michael LeVine, F. Drasgow (1988)
Optimal appropriateness measurementPsychometrika, 53
M. Tanner, W. Wong (1987)
The calculation of posterior distributions by data augmentationJournal of the American Statistical Association, 82
H. Wainer, Eric Bradlow, Z. Du (2000)
Testlet Response Theory: An Analog for the 3PL Model Useful in Testlet-Based Adaptive Testing
Eric Bradlow, A. Zaslavsky (1999)
A hierarchical latent variable model for ordinal data from a customer satisfaction survey with no answer responsesJournal of the American Statistical Association, 94
S. Reise (2001)
Computerized Adaptive Testing: A Primer (Second Edition). Howard Wainer (Ed.) (with Neil Dorans, Donald Eignor, Ronald Flaugher, Bert Green, Robert Mislevy, Lynne Steinberg, and David Thissen). [book review]Applied Psychological Measurement, 25
A. Woods, R. Baker (1985)
Item response theoryLanguage Testing, 2
(1991)
MULTILOG user’s guide (Version 6)
Adrian Smith, G. Roberts (1993)
Bayesian computation via the gibbs sampler and related markov chain monte carlo methods (with discus
W. Linden, C. Glas (2000)
Computerized adaptive testing : theory and practice
H. Wainer, G. Kiely (1987)
Item Clusters and Computerized Adaptive Testing: A Case for TestletsJournal of Educational Measurement, 24
Jinming Zhang, W. Stout (1999)
The theoretical detect index of dimensionality and its application to approximate simple structurePsychometrika, 64
F. Lord, M. Novick, A. Birnbaum (1966)
Some latent train models and their use in inferring an examinee's ability
B. Winer (1972)
Statistics. A Guide to the Unknown.Psyccritiques, 17
Eric Bradlow, H. Wainer, Xiaohui Wang (1999)
A Bayesian random effects model for testletsPsychometrika, 64
F. Lord (1980)
Applications of Item Response Theory To Practical Testing Problems
W. Hastings (1970)
Monte Carlo Sampling Methods Using Markov Chains and Their ApplicationsBiometrika, 57
D. Thissen, L. Steinberg, Joan Mooney (1989)
Trace Lines for Testlets: A Use of Multiple-Categorical-Response Models.Journal of Educational Measurement, 26
J. Albert, S. Chib (1993)
Bayesian analysis of binary and polychotomous response dataJournal of the American Statistical Association, 88
W. Stout (1987)
A nonparametric approach for assessing latent trait unidimensionalityPsychometrika, 52
H. Wainer (2000)
Computerized adaptive testing: A primer, 2nd ed.
(1995)
lTSE score user’s manual
M. Tatsuoka, F. Lord, M. Novick, A. Birnbaum (1971)
Statistical Theories of Mental Test Scores.Journal of the American Statistical Association, 66
F. Samejima (1968)
Estimation of latent ability using a response pattern of graded scoresPsychometrika, 34
(1988)
Appropriateness Measurement Psychometrika
H. Braun, H. Wainer (1982)
MAKING ESSAY TEST SCORES FAIRER WITH STATISTICSETS Research Report Series, 1982
F. Drasgow, Michael LeVine, E. Williams (1984)
Appropriateness measurement with polychotomous item response models and standardized indicesBritish Journal of Mathematical and Statistical Psychology, 38
R. Bock (1972)
Estimating item parameters and latent ability when responses are scored in two or more nominal categoriesPsychometrika, 37
H. Hartley, J. Rao (1967)
Maximum-likelihood estimation for the mixed analysis of variance model.Biometrika, 54 1
Eric Bradlow, H. Wainer (1998)
SOME STATISTICAL AND LOGICAL CONSIDERATIONS WHEN RESCORING TESTS
The need for more realistic and richer forms of assessment in educational tests has led to the inclusion (in many tests) of polytomously scored items, multiple items based on a single stimulus (a “testlet”), and the increased use of a generalized mixture of binary and polytomous item formats. In this paper, the authors extend earlier work on the modeling of testlet-based response data to include the situation in which a test is composed, partially or completely, of polytomously scored items and/or testlets. The model they propose, a modified version of commonly employed item response models, is embedded within a fully Bayesian framework, and inferences under the model are obtained using Markov chain Monte Carlo techniques. The authors demonstrate its use within a designed series of simulations and by analyzing operational data from the North Carolina Test of Computer Skills and the Educational Testing Service’s Test of Spoken English. Their empirical findings suggest that the North Carolina Test of Computer Skills exhibits significant testlet effects, indicating significant dependence of item scores obtained from common stimuli, whereas the Test of Spoken English does not.
Applied Psychological Measurement – SAGE
Published: Mar 1, 2002
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
Copy and paste the desired citation format or use the link below to download a file formatted for EndNote
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.