Bilgisayar Ortamında Bireye Uyarlanmış Test Uygulamalarında Ölçme Kesinliğinin ve Test Uzunluğunun Farklı Koşullar Altında İncelenmesi / Investigation of Measurement Precision and Test Length in Computerized Adaptive Testing Under Different Conditions

Ebru Balta; Arzu Uçar

doi:10.19160/e-ijer.1023098

EN TR

Investigation of Measurement Precision and Test Length in Computerized Adaptive Testing Under Different Conditions / Bilgisayar Ortamında Bireye Uyarlanmış Test Uygulamalarında Ölçme Kesinliğinin ve Test Uzunluğunun Farklı Koşullar Altında İncelenmesi

Abstract

Computerized Adaptive Tests (CAT) are gaining much more attention than ever by the institutions especially the ones attracting students worldwide due to the nature of CAT not allowing the same items to be presented to different individuals taking the test. In this study, it was aimed to investigate of measurement precision and test length in computerized adaptive testing (CAT) under different conditions. The research was implemented as a Monte Carlo simulation study. In line with the purpose of the study, 500 items which response probabilities were modeled with the three parameter logistic (3PL) model were generated. Fixed length (15,20), standard error (SE<.30, SE<.50) termination rules have been used for the study. Additionally, in comparing termination rules, different starting rules (θ=0,-1<θ<1), ability estimation methods (Maksimum Likelihood Estimation (MLE) ,Expected a Posteriori (EAP) and Maximum a Posteriori Probability (MAP)), item selection method (Kullback Leibler Information (KLI) and Maximum Fischer Information (MFI)) have been selected since these are critical in the algorithms of CAT. 25 replications was performed for each condition in the generated data. The results obtained from study were evaluated by using RMSE, bias and fidelity values criterions. R software was used for data generation and analyses. As a result of the study, it was seen that choosing the test starting rule as θ=0 or -1<θ<1 did not cause a significant difference in terms of measurement precision and test length. It was concluded that the termination rule, in which RMSE and bias values were lower than the other conditions, was the 0.30 SE termination rule. When the EAP ability estimation method was used, lower RMSE and bias values were obtained compared to the MLE. It was concluded that the KLI item selection method had lower RMSE and bias values compared to the MFI.

Keywords

Bilgisayar Ortamında Bireye Uyarlanmış Test Uygulamalarında Ölçme Kesinliğinin ve Test Uzunluğunun Farklı Koşullar Altında İncelenmesi / Investigation of Measurement Precision and Test Length in Computerized Adaptive Testing Under Different Conditions

Öz

Bu araştırmada, bilgisayar ortamında bireye uyarlanmış test (BBT) uygulamalarında, ölçme kesinliği ve test uzunluğunun, farklı test durdurma kurallarına göre değişiminin teste başlama kuralına, madde seçme ve yetenek kestirim yöntemlerine göre incelenmesi amaçlanmıştır. Araştırma, Monte Carlo simülasyon çalışması olarak gerçekleştirilmiştir. Araştırmanın amacı doğrultusunda, tepki olasılıklarının üç parametreli lojistik (3PL) model ile modellendiği 500 madde üretilmiştir. Araştırmada, teste başlama kuralı (θ=0,-1<θ<1), madde seçim yöntemi (Maksimum Fisher Bilgisi (MFB), Kullbak-Leibler Bilgisi (KLB)) , yetenek kestirim yöntemi (Maksimum Olabilirlik Kestirimi (MOK), Beklenen Sonsal Dağılım (BSD) ve Maksimum Sonsal Dağılım (MSD)) ve testi durdurma kuralı (sabit uzunluklu (15,20), yetenek kestiriminin standart hatası (SH<.30, SH<.50)) olmak üzere her koşul için 25 yineleme ile toplam 48 (2x2x3x4) koşul incelenmiştir. Araştırma kapsamında ölçme kesinliğini belirlemede hata göstergeleri olan RMSE, yanlılık, uyum değerleri incelenmiştir. Veri üretiminde ve analizinde R yazılımı kullanılmıştır. Çalışmanın sonucunda, teste başlama kuralının koşullara göre ölçme kesinliği ve test uzunluğu açısından farklılık oluşturmadığı görülmüştür. RMSE ve yanlılık değerlerinin daha düşük elde edildiği durdurma kuralının 0,30 SH durdurma kuralı olduğu sonucuna ulaşılmıştır. BSD yetenek kestirim yönteminde MOK’a kıyasla daha düşük RMSE ve yanlılık değerleri elde edilmiştir. KLB madde seçim yönteminin MFB’ye kıyasla daha düşük RMSE ve yanlılık değerlerine sahip olduğu sonucuna ulaşılmıştır. Araştırmaya benzer bir çalışma farklı madde havuzu büyüklükleriyle gerçekleştirilebilir. Ayrıca madde havuzunun özellikleri değiştirilerek durdurma kurallarının karşılaştırılması yapılabilir. Çalışmada maddelerin kullanım sıklıkları göz önünde bulundurulmamıştır. Maddelerin kullanım sıklıklarını dikkate alan benzer çalışmalar gerçekleştirilebilir.

Anahtar Kelimeler

References

Babcock, B. & Weiss, D. J. (2009). Termination criteria in computerized adaptive tests: variable-length cats are not biased. Paper presented at The 2009 Conference on Computerized Adaptive Testing, Minnesota, USA.https://www.researchgate.net/publication/262674764_Termination_Criteria_in_Computerized_Adaptive_Tests_Do_Variable-Length_CATs_Provide_Efficient_and_Effective_Measurement
Babcock, B. ve Weiss, D. J. (2012). Termination criteria in computerized adaptive tests: do variable-length CATs provide efficient and effective measurement? Journal of Computerized Adaptive Testing, 1(1), 1–18. https://doi.org/10.7333/1212-0101001
Baker, F.B. & Kim, S.H. (2004). Item response theory: Parameter estimation techniques. Marcel Bekker Inc.
Birnbaum, A. (1968). Some latent trait models and their use in inferring an examaninee’s ability. In Lord, F.M. & Novick, M.R. (Eds.) Statistical theories of mental test scores (pp. 397-479) . Addison-Wesley.
Blais, J.& Raiche, G. (2002). Features of the sampling distribution of the ability estimate in computerized adaptive testing according to two stopping rules. Paper presented at The International Objective Measurement Workshop International Objective Measurement Workshop, New Orleans, USA. https://pubmed.ncbi.nlm.nih.gov/21164229/
Blais, J. & Raiche, G. (2010). Features of the sampling distribution of the ability estimate in Computerized Adaptive Testing according to two stopping rules, Journal of Applied Measurement, 11(4), 424-31. https://www.researchgate.net/publication/49689146
Bock, R. D. & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46(4), 443–459. https://link.springer.com/article/10.1007/BF02293801
Bock, R. D. & Mislevy, R. J. (1982). Adaptive EAP estimation of ability in a microcomputer environment. Applied Psychological Measurement, 6(4), 431– 444. https://doi.org/10.1177/014662168200600405

Bulut, O., & Kan, A. (2012). Application of computerized adaptive testing to entrance examination for graduate studies in Turkey. Eurasian Journal of Educational Research,12(49), 61-80. https://files.eric.ed.gov/fulltext/EJ1059924.pdf
Chang, S. W. & Ansley, T. N. (2003). A comparative study of item exposure control methods in computerized adaptive testing. Journal of Educational Measurement, 40(1), 71–103. https://doi.org/10.1111/j.1745-3984.2003.tb01097.x
Chang, H. & Ying, Z. (1996). A global information approach to computerized adaptive testing. Applied Psychological Measurement, 20 (3), 213–229. https://doi.org/10.1177/014662169602000303
Chang, H. & Ying, Z. (1999). A-stratified multistage computerized adaptive testing. Applied Psychological Measurement, 25(4), 333-341. https://www.researchgate.net/publication/238681527
Choi, S. W., Grady, M.W., & Dodd, B.G. (2010). A new stopping rule for computerized adaptive testing. Educational and Psychological Measurement, 70(6), 1-17. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3028267/
Deng, H., Ansley, T., & Chang, H. (2010). Stratified and maximum information item selection procedures in computer adaptive testing. Journal of Educational Measurement, 47(2), 202-226. https://onlinelibrary.wiley.com/journal/17453984
Eggen, T. H. J. M. (1999). Item Selection in Adaptive Testing with the Squential Probability Ratio Test. Applied Psychological Measurement, 23(3), 249-261. https://doi.org/10.1177/01466219922031365
Eggen, T. (2004). Contributions to the theory and practice of Computerized Adaptive Testing. (Unpublished doctoral dissertation). University of Twente, Enschede, Netherlands.
Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. Lawrence Erlbaum Associates.
Eroğlu, M. G. & Kelecioğlu, H. (2015). Bireyselleştirilmiş bilgisayarlı test uygulamalarında farklı sonlandırma kurallarının ölçme kesinliği ve test uzunluğu açısından karşılaştırılması. Uludağ Üniversitesi Eğitim Fakültesi Dergisi, 28(1), 31-52. https://doi.org/10.19171/uuefd.87973
Evans, J. J. (2010). Comparability of examinee proficiency scores on computer adaptive tests using real and simulated data. (Unpublished doctoral dissertation). The State University of New Jersey, New Brunswick, United States.
Glas, C.A. & Linden, W. (2003). Computerized adaptive testing with item cloning. Applied Psychological Measurement, 27 (4), 247–261. https://doi.org/10.1177/0146621603027004001
Hambleton, R. K. & Swaminathan, H. (1985). Item response theory: Principles and application. Kluwer-Nijhoff.
Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of itemresponse theory. Sage Publications Inc.
Hambleton, R. K. & Xing, D. (2006). Optimal and nonoptimal computer-based test designs for making pass-fail decisions. Applied Measurement in Education, 19(3), 221–239. https://www.tandfonline.com/journals/hame20
Han, K. T. (2009). A gradual maximum information ratio approach to item selectionin computerized adaptive testing. Paper presented at The Conference on Computerized Adaptive Testing, Minnesota, USA. http://www.iacat.org/sites/default/files/biblio/cat09han.pdf
Han, K. T. (2010). Comparision of non-fisher information item selection criteria in fixed length computerized adaptive testing. Paper presented at The Annual Meeting of the National Council on Measurement in Education, Denver, USA.http://www.umass.edu/remp/software/simcata/papers/NCME2010_1_HAN.pdf
Harwell, M., Stone, C. A., Hsu, T. C., & Kirisci, L. (1996). Monte carlo studies in item response theory. AppliedPsychological Measurement, 20(2), 101–125. https://journals.sagepub.com/doi/10.1177/014662169602000201
Ho, T. (2010). A comparison of item selection procedures using different ability estimation methods in computerized adaptive testing based on generalized partial credit model. (Unpublished doctoral dissertation). The State University of Texas, TX, United States.
İşeri, A. I. (2002). Assessment of students' mathematics achievement through computer adaptive testingprocedures. (Yayımlanmamış doktora tezi). Orta Doğu Teknik Üniversitesi, Ankara, Türkiye.
Ivei, J. L. (2007). Test taking strategies in computer adaptive testing that will ımprove your score: factor fiction?. (Unpublished doctoral dissertation). The State University of Texas, TX, United States.
Kalender, İ. (2011). Effects of different computerized adaptive testing strategies on recovery of ability.(Yayımlanmamış doktora tezi). Orta Doğu Teknik Üniversitesi, Ankara, Türkiye.
Kingsbury, G. G., & Zara, A. R. (1989). Procedures for selecting items for computerized adaptive tests. Applied Measurement in Education, 2(4), 359- 375. https://www.tandfonline.com/doi/abs/10.1207/s15324818ame0204_6?journalCode=hame20
Linda, T. (1996). A comparision of the traditional maximum information method and the global information method in CAT item selection. Paper presented at The Annual Meeting of the National Council on Measurement in Education, New York, NY USA. http://www.iacat.org/content/comparison-traditional-maximum-information-method-and-global-information-method-cat-item
Lord, F. M. (1980). Applications of item response theory to practical testing problems. Lawrence Erlbaum Associates.
Lord, F. M. (1983). Unbiased estimators of ability parameters of their variance, and of their parallel- forms reliability. Psychometrika, 48(2), 233-245. https://doi.org/10.1007/BF02294018
Lord, F. & Stocking, M. (1988). Item reponse theory. In J. P. Keeves (Eds.). Educational research, methodology, and measurement: An international handbook (pp. 269-272) . Pergamon Press.
MacDonald, P. L. (2002). Computer adaptive test for measuring personality factors using item response theory. (Unpublished doctoral dissertation). The University Western of Ontario, Ontario, Canada.
Magis, D. & Raîche, G. (2012). Random generation of response patterns under computerized adaptive testingwith the R package catR. Journal of Statistical Software, 48 (8), 1-31. https://www.jstatsoft.org/article/view/v048i08
McLeod, L. D. & Schnipke, D. L. (1999). Detecting items that have been memorized in thecomputerized adaptive testing environment. Paper presented at The Annual Meeting of the National Council on Measurement in Education, Montreal, Quebec, Canada. https://files.eric.ed.gov/fulltext/ED432592.pdf
Mills, C. N. & Stocking, M.L. (1996). Practical issues in large-scale computerized adaptive testing. AppliedMesurement in Education, 9(4), 287-304. https://www.tandfonline.com/doi/abs/10.1207/s15324818ame0904_1?journalCode=hame20
Orcutt, V. L. (2002). Computerized adaptive testing: Some issues in development. Paper presented at TheAnnual Meeting of the Educational Research Exchange, Denton, TX USA. https://www.academia.edu/48173923/Computerized_Adaptive_Testing_Some_Issues_in_Development
Parshall, C. G., Spray, J. A., Kalohn, J. C., & Davey, T. (2002). Practical considerations in computer-based testing.New York, NY: Springer
Reckase, M. D. (1983). A procedure for decision making using tailored testing. In D. J. Weiss (Eds.). Newhorizonsin testing: latent trait theory and computerized adaptive testing. New York: Academic Press
Samejima, F. (1977). A method of estimating item characteristic functions using the maximum likelihoodestimate of ability. Psychometrika, 42(2), 163-191. https://doi.org/10.1007/BF02294047
Segall, D. O. (1996). Multidimensional adaptive testing. Psychometrika,61(2), 331-354. https://media.metrik.de/uploads/incoming/pub/Literatur/1996_Multidimensional%20adaptive%20testing.pdf
Segall, D. O. (2004). Computerized adaptive testing. In Kempf-Leanard (Eds.). The encyclopedia ofsocial measurement, (pp. 429 – 438). Academic Press.
Sereci, S. (2003). Computerized Adaptive Testing: An Introduction. In J.E. Wall ve G.R. Walz (Eds.).Measuring Up: Assessment Issues for Teachers, Counselors and Administrators, (pp.685-697). CAPS Press.
Scullard, M. G. (2007). Application of item response theory based computerized adaptive testing to the strong interest inventory. (Unpublished doctoral dissertation), University of Minnesota, Minnesota, United States.
Simms, L. J. & Clark, L. A. (2005) . Validation of a computerized adaptive version of the schedulefor non-adaptive and adaptive personality (SNAP). Psychological Assessment, 17(1), 28-43. https://doi.org/10.1037/1040-3590.17.1.28
Spray, J. A. & Reckase, M. D. (1994). The selection of test items for decision making with a computer adaptive test. Paper presented at The Annual Meeting of the National Council on Measurement in Education. New Orleans, LA, United States. https://files.eric.ed.gov/fulltext/ED372078.pdf
Stocking, M. L. (1992). Controlling item exposure rates in a realistic adaptive testing paradigm (Research Report No. 93-2). Princeton, NJ: Educational Testing Service. https://onlinelibrary.wiley.com/doi/pdf/10.1002/j.2333-8504.1993.tb01513.x
Sulak, S. (2013). Bireyselleştirilmiş bilgisayarlı test uygulamalarında kullanılan madde seçme yöntemlerinin karşılaştırılması. (Yayımlanmamış doktora tezi). Hacettepe Üniversitesi, Ankara, Türkiye.
Sulak, S. & Kelecioğlu, H. (2019). Investigation of item selection methods according to test termination rules in CAT applications. Journal of Measurement and Evaluation in Education and Psychology, 10(3), 315-326. https://dergipark.org.tr/tr/pub/epod
Şahin, A. & Özbaşı, D. (2017). Effects of content balancing and item selection method on ability estimation in computerized adaptive testing. Eurasian Journal of Educational Research, 17(69), 21-36. http://dergipark.org.tr/ejer/issue/42462/511414
Tatsuoka, C. & Ferguson, T. (2003). Sequential classification on partially ordered sets. Journal of Royal Statistics, 65, 143–157. https://www.jstor.org/stable/3088831
Thompson, N. A. (2007b). Computerized classification testing with composite hypotheses. Paper presented at The GMAC Conference on Computerized Adaptive Testing, Minneapolis, United States. https://www.researchgate.net/publication/229046974_Computerized_classification_testing_with_composite_hypotheses
Thompson, N. A. & Weiss, D. J. (2011). A framework for the development of computerized adaptive tests. Practical Assessment, Research, and Evaluation, 16(1), 1-9. http://pareonline.net/getvn.asp?v=16&n=1
Thissen, D. & Steinberg, L. (2009). Item response theory. In R. Millsap ve A. Maydeu-Olivares (Eds.) The sage handbook of quantitative methods in psychology. Sage Publications.
Urry, V. W. (1977). Tailored testing: A successful application of latent trait theory. Journal of Educational Measurement, 14(2), 181-196. https://onlinelibrary.wiley.com/journal/17453984
Veldkamp, B.P. (2012). Ensurind The Future of Computerized Adaptive Testing. In Theo J.H.M. Eggen ve Veldkamp, B.P. (Eds.). Psychometrics in Practice at RCEC, (pp.39-50). RCEC.
Veldkamp, B. P. & van der Linden, W. J. (2002). Multidimensional adaptive testing with constraints on test content. Psychometrika, 67(4), 575–588. https://doi.org/10.1007/BF02295132
Veldkamp, B.P. & van der Linden. W.J. (2010). Designing item pools for adaptive testing. In W.J van der Linden. ve C.A.W. Glas.(Eds.). Computerized adaptive testing: Theory and practice, (pp.149-162). Springer.
Wang, T., Hanson, B. A., & Lau, C, (1999). Reducing bias in CAT ability estimation: a comparison of approaches. Applied Psychological Measurement, 23 (3), 263-278. https://doi.org/10.1177/01466219922031383
Wang, T., & Vispoel, W. P. (1998). Properties of ability estimation methods computerized adaptive testing. Journal of Educational Measurement, 35 (2), 109-135. https://www.jstor.org/stable/1435235
Wainer, H. (2000). Computerized Adaptive Testing. Lawrence Erlbaum Assc.
Wang, S., & Wang, T. (2001). Precision of warm’s weighted likelihood estimates for a polytomous model in computerized adaptive testing. Applied Psychological Measurement, 25(4), 317-331. https://journals.sagepub.com/doi/10.1177/01466210122032163
Warm, T. A. (1989). Weighted likelihood estimation of ability in item response theory. Psychometrika, 54(3), 427-450. https://doi.org/10.1007/BF02294627
Wen, H., Chang, H., & Hau, K. (2000). Adaption of a-stratified method in variable length computerized adaptive testing. Paper presented at The American Educational Research Association Annual Meeting, Seattle, USA. https://eric.ed.gov/?id=ED465763
Weiss, D. J.(1982). Improving measurement quality and efficiency with Adaptive Testing. Applied Psychological Mesurement, 6(4),473-492. https://doi.org/10.1177/014662168200600408
Weiss, D. J. (1983). New horizons in testing: Latent trait test theory and computerized adaptive testing. Academic Press.
Weiss, D. J. (2004). Computerized adaptive testing for effective and efficient measurement in counseling and education. Measurement and Evaluation inCounseling and Development, 37 (2), 70-84. https://doi.org/10.1080/07481756.2004.11909751
Weiss, D. J., & Kingsbury, G. G. (1984). Application of computerized testing to educational problems. Journal of Educational Measurement, 21(4), 361-375. https://doi.org/10.1111/j.1745-3984.1984.tb01040.x
Weissman, A. (2003). Assessing the efficiency of item selection in computerized adaptive testing. (Unpublished doctoral dissertation), University of Pittsburgh, Pensilvanya, United States.
Yi, Q., Wang, T., & Ban, J.C. (2001). Effects of scale transformation and test-termination rule on the precision of ability estimation in computerized adaptive testing. Journal of Educational Measurement, 38(3), 267-292. https://www.jstor.org/stable/1435124
Yi, Q., & Chang, H. (2003). a-Stratified CAT design with content blocking. British Journal of Mathematical and Statistical Psychology, 56 (2),359–378. https://pubmed.ncbi.nlm.nih.gov/14633340/

Details

Primary Language

Turkish

Subjects

Studies on Education

Journal Section

Research Article

Authors

Ebru Balta ^*
0000-0002-2173-7189
Türkiye

Arzu Uçar
0000-0002-0099-1348
Türkiye

Publication Date

February 28, 2022

Submission Date

November 13, 2021

Acceptance Date

January 26, 2022

Published in Issue

Year 2022 Volume: 13 Number: 1

DOI

https://doi.org/10.19160/e-ijer.1023098

IZ

https://izlik.org/JA82LG36YA

Cite

RIS / Bibtex

APA

Balta, E., & Uçar, A. (2022). Bilgisayar Ortamında Bireye Uyarlanmış Test Uygulamalarında Ölçme Kesinliğinin ve Test Uzunluğunun Farklı Koşullar Altında İncelenmesi / Investigation of Measurement Precision and Test Length in Computerized Adaptive Testing Under Different Conditions. E-Uluslararası Eğitim Araştırmaları Dergisi, 13(1), 51-68. https://doi.org/10.19160/e-ijer.1023098

Cited By

The Effects of Different Item Selection Methods on Test Information and Test Efficiency in Computer Adaptive Testing

Eğitimde ve Psikolojide Ölçme ve Değerlendirme Dergisi

https://doi.org/10.21031/epod.1140757

Detection of aberrant testing behaviour in unproctored CAT via a verification test

International Journal of Assessment Tools in Education

https://doi.org/10.21449/ijate.1598330