Scoring Methods for Multiple Choice Tests: How does the Item Difficulty Weighted Scoring Change Student’s Test Results?

Metin Yaşar; Seval Kartal; Eren Can Aybek

doi:10.14686/buefad.878504

Research Article

Scoring Methods for Multiple Choice Tests: How does the Item Difficulty Weighted Scoring Change Student’s Test Results?

Year 2021, Volume: 10 Issue: 2, 309 - 324, 05.06.2021

Metin Yaşar Seval Kartal Eren Can Aybek

https://doi.org/10.14686/buefad.878504

Cited By: 1

Abstract

In the present study, it was aimed to compare students’ test scores, item and test statistics calculated based on the unweighted (1-0) and item difficulty weighted scores (Qj - 0). The study also included a proposal for converting the weighted scores into a 100-point scale system. A teacher-made 34-item multiple-choice achievement test was conducted to a group of 431 people. As a result of the data analysis, the McDonald's Omega internal consistency coefficients that were obtained according to the 1-0 and (Q_j-0) methods were obtained as .725 and .721, respectively. The Pearson’s product moment correlation coefficient was .916, and the Spearman’s rank-order correlation coefficient was .926 between student scores obtained according to the two methods. Furthermore, a criterion-based evaluation was made based on the two criteria (test scores of 50 and 60), and the numbers of the students who were successful and unsuccessful in the course were determined according to both scoring methods. Accordingly, it was found that more students would be considered unsuccessful in the course in the (Q_j-0) scoring method; however, it was understood that this method could reveal differences among individuals more than the unweighted scoring method.

Keywords

teacher-made test, multiple choice tests, scoring methods

References

Akkuş, O. & Baykul, Y. (2001) Çoktan seçmeli test maddelerinin puanlamada, seçenekleri farklı biçimlerde ağırlıklandırmanın madde ve test istatistiklerine olan etkisinin incelenmesi [An investigation on the effects of different item-option scoring methods on item and test parameters ]. Hacettepe University Journal of Education, 20, 9-15.
Bacon, D. R. (2003). Assessing learning outcomes: A comparison of multiple-choice and short answer questions in a marketing context. Journal of Marketing Education, 25, 31-36. doi: 10.1177/0273475302250570
Bar-Hillel, M., Budescu, D., & Attali, Y. (2005). Scoring and keying multiple choice tests: A case study in irrationality. Mind and Society, 4, 3-12. doi: 10.1007/s11299-005-0001-z
Bejar, I., & Weiss, D.J., (1977) A comparison of empricial differential of inter-item correlation. Educational and Pyschological Measurement. 37, 335-340. doi: 10.1177/001316447703700207
Bereby-Meyer, Y., Meyer, Y., & Flascher, O. M. (2002). Prospect theory analysis of guessing in multiple choice tests. Journal of Behavioral Decision Making, 15, 313–327. doi: 10.1002/bdm.417
Buckles, S., & Siegfried, J.J., (2006). Using in-depth multiple-choice questions to evaluate in-depth learning of economics. Journal of Economics Education, 37, 48-57. doi: 10.3200/JECE.37.1.48-57.
Budescu, D. V., & Bar-Hillel, M. (1993). To guess or not to guess: A decision-theoretic view of formula scoring. Journal of Educational Measurement, 30(4), 277–291. doi: 10.1111/j.1745-3984.1993.tb00427.x
Budescu, D. V. (1979) Differential weighting of multiple-choice items. Educational Testing Service Princeton.
Burton, R. F. (2001). Quantifying the effects of chance in multiple choice and true/false tests: Question selection and guessing of answers. Assessment & Evaluation in Higher Education, 26(1), 41–50. doi: 10.1080/02602930020022273
Clark, D., & Linn, M. C. (2003). Designing for knowledge integration: The impact of instructional time. Journal of the Learning Sciences, 12, 451–493. doi: 10.1207/S15327809JLS1204_1
Choppin, B. H. (1988). Correction for guessing. In J. P. Keeves (Ed.), Educational research, methodology, and measurement: An international handbook (pp. 384–386). Pergamon Press.
DiBattista, D. & Kurzawa, L. (2011). Examination of the quality of multiple-choice items on classroom tests. The Canadian Journal for the Scholarship of Teaching and Learning, 2(2). doi: 10.5206/cjsotl-rcacea.2011.2.4
Donlon, T.F. & Fitzpatrick, A.R. (1978) The statistical structure of multiple choice items. In Proceedings of the Annual Meeting of the Northeastern Educational Research Association, Oct. 1978, Ellenville, New York.
Echternacht, G. (1976) Reliability and validity of item option weighting schemes. Educational and Pyschological Measurement, 36, 301-309. doi: 10.1177/001316447603600208
Gözen, G. (2006). Kısa cevaplı ve çoktan seçmeli maddelerin “0-1” ve ağırlıklı puanlama yöntemleri ile puanlanmasının testin psikometrik özellikleri açısından incelenmesi [Analysis of short–answered and multiplechoice items via “1-0” and weighted scoring methods according to pyschometric characteristics of tests]. Educational Science & Practice, 5(9), 35-52
Frary, R. (1989) Partial credit scoring methods for multiple choice test. Applied Measurement in Education, 2(1), 79-96. doi: 10.1207/s15324818ame0201_5
Hendrickson, G., (1971) The effect of differential option Weighting on multiple choice objective test items. Report Number 93, The John Hopkins University.
Heubert, J. P., & Hauser, P. M. (1999). High-stakes testing for tracking, promotion, and graduation. National Academy Press.
Kubinger, K. D., Holocher-Ertl, S., Reif, M., Hohensinn, C., & Frebort, M. (2010). On minimizing guessing effects on multiple-choice items: Superiority of a two solutions and three distractors item format to a one solution and five distractors item format. International Journal of Selection and Assessment, 18(1), 111–115. doi: 10.1111/j.1468-2389.2010.00493.x
Prihoda, T. J., Pinckard, R. N., McMahan, C. A., & Jones, A. C. (2006). Correcting for guessing increases validity in multiple-choice examinations in an oral and maxillofacial pathology course. Journal of Dental Education, 70(4), 378-386. doi: 10.1002/j.0022-0337.2006.70.4.tb04092.x
Jaradat, D. & Tollefson, N. (1997) The impact of alternative scoring procedure for multiple choice items on test reliability, validity and grading. Educational and Pyschlogical Measurement, 48, 627-635. doi: 10.1177/0013164488483006
Mavis, B. E., Cole, B. L., & Hoppe, R. B. (2001). A survey of student assessment in U.S. medical schools: The balance of breadth versus fidelity. Teaching and Learning in Medicine, 13, 74-79. doi: 10.1207/S15328015TLM1302_1
McDougall, D. (1997). College faculty’s use of objective tests: State-of-the-practice versus state-of-the-art. Journal of Research and Development in Education, 30, 183–93.
Merwin, J. (1959) Rational and mathematical relationships of six scoring procedures applicable to three-choice items. Journal of Educational Psychology, 50(4). doi: 10.1037/h0045073
Özdemir, D. (2003). Çoktan seçmeli testleri puanlama yöntemlerine bir bakış [An overview of methods for scoring multiple choice tests]. Eğitim Araştırmaları Dergisi, 4(12),121-122
Özdemir, D. (2004) Çoktan seçmeli testlerin klasik test teorisi ve örtük özellikler teorisine göre hesaplanan psikometrik özelliklerinin iki kategorili ve ağırlıklandırılmış puanlaması yönünden karşılaştırılması [A comparison of psychometric characteristics of multiple choice tests based on the binarys and weighted scoring in respect to classical test and latent trait theory]. Hacettepe University Journal of Education, 26, 117-123
Palmer, E.J. & Dewitt,P.G. (2007) Assessment of higher order cognitive skills in undergraduate education: modified assey or multiple choice questions? BMC Medical Education, 20, 129-158. doi: 10.1186/1472-6920-7-49
Ramsay, J.O. (1968) A scoring system for multiple choice test items. The British Journal of Mathematical and Statistical Psychology, 41, 249-262. doi: 10.1111/j.2044-8317.1968.tb00413.x
Reilly R.R., & Jackson,R. (1972). Effects of empirical option weighting on reliability and validity of GRE. Journal of Educational Measurement, 10(3), 185-193. doi: 10.1111/j.1745-3984.1973.tb00796.x
Roediger, H. L., III, & Marsh, E. J. (2005). The positive and negative consequences of multiple-choice testing. Journal of Experimental Psychology: Learning, Memory, and Cognition, 31, 1155–1159. doi: 10.1037/0278-7393.31.5.1155
Rowley, G.L., & Traub, R.e (1977) Formula scoring, number–right scoring, and test-taking strategy. Journal of Educational Measurement, 14, 15-22.
Sax, G. (1989) Principle of educational and pscyhological measurement and evaluation. Wadsworth. Shepard, L. A. (2000). The role of assessment in a learning culture. Educational Researcher, 29, 4–14. doi: 10.3102/0013189X029007004
Kurz. T. B. (1999). A review of scoring algorithms for multiple choice tests. EDRS Publications, Report NO: ED 428 076
Walsh, C.M. & Seldomridge, L.A. (2006). Critical thinking: Back to square two. Nursing Education, 45, 212-219. doi: 10.3928/01484834-20060601-05
Weitzman, R.A: (1970) Ideal multiple choice items. Journal of The American Statistical Association, 65(329), 71-89. doi: 10.1080/01621459.1970.10481063
Wilson, M., & Wang, W. C. (1995). Complex composites: Issues that arise in combining different modes of assessment. Applied Psychological Measurement, 19, 51–71. doi: 10.1177/014662169501900107
Yurdugül, H. (2010) Farklı madde puanlama yöntemlerinin ve test puanlama yöntemlerinin karşılaştırılması [Different item scoring methods and different test scoring comparison of methods]. Journal of Measurement and Evaluation in Education and Psychology, 1(1) 1-8.

Çoktan Seçmeli Testlerde Puanlama Yöntemleri: Madde Güçlüğüne Dayalı Ağırlıklandırma Öğrencilerin Test Sonuçlarını Nasıl Değiştirir?

Year 2021, Volume: 10 Issue: 2, 309 - 324, 05.06.2021

Metin Yaşar Seval Kartal Eren Can Aybek

https://doi.org/10.14686/buefad.878504

Cited By: 1

Abstract

Bu çalışmada öğretmen yapımı çoktan seçmeli 34 maddelik bir başarı testinin 431 kişilik bir gruba uygulanmıştır. Daha sonra ağırlıklandırılmamış (1 - 0) ve madde güçlüğüne göre (Qj - 0) ağırlıklandırılmış puanlara göre madde ve test istatistiklerinin, öğrencilerin dersten geçme ve kalma durumlarının karşılaştırılması amaçlanmıştır. Aynı zamanda ağırlıklandırılmış puanların 100’lük puan sistemine çevrilmesine yönelik bir öneri de sunulmuştur. Veri analizi sonucunda 1 - 0 ve Q - 0 yöntemlerine göre elde edilen McDonald’s Omega iç tutarlık katsayıları sırasıyla .725 ve .721 olarak elde edilmiştir. İki yönteme göre elde edilen öğrenci puanları arasında ise Pearson momentler çarpım korelasyon katsayısı .916 ve Spearman sıra farkları korelasyon katsayısı .926 olarak bulunmuştur. Aynı zamanda sırasıyla 50 ve 60 puana göre ölçüt dayanaklı bir değerlendirme yapıldığında, her iki yönteme göre dersten başarılı ve başarısız sayılan öğrenci sayıları belirlenmiştir. Buna göre Qj – 0 puanlama yöntemine göre daha çok öğrencinin dersten başarısız sayılacağı bulunmuş, ancak buna karşın bu yöntemin bireyler arasındaki farklılıkları daha iyi ortaya koyabileceği anlaşılmıştır.

Keywords

öğretmen yapımı test, çoktan seçmeli testler, puanlama yöntemleri

References

Akkuş, O. & Baykul, Y. (2001) Çoktan seçmeli test maddelerinin puanlamada, seçenekleri farklı biçimlerde ağırlıklandırmanın madde ve test istatistiklerine olan etkisinin incelenmesi [An investigation on the effects of different item-option scoring methods on item and test parameters ]. Hacettepe University Journal of Education, 20, 9-15.
Bacon, D. R. (2003). Assessing learning outcomes: A comparison of multiple-choice and short answer questions in a marketing context. Journal of Marketing Education, 25, 31-36. doi: 10.1177/0273475302250570
Bar-Hillel, M., Budescu, D., & Attali, Y. (2005). Scoring and keying multiple choice tests: A case study in irrationality. Mind and Society, 4, 3-12. doi: 10.1007/s11299-005-0001-z
Bejar, I., & Weiss, D.J., (1977) A comparison of empricial differential of inter-item correlation. Educational and Pyschological Measurement. 37, 335-340. doi: 10.1177/001316447703700207
Bereby-Meyer, Y., Meyer, Y., & Flascher, O. M. (2002). Prospect theory analysis of guessing in multiple choice tests. Journal of Behavioral Decision Making, 15, 313–327. doi: 10.1002/bdm.417
Buckles, S., & Siegfried, J.J., (2006). Using in-depth multiple-choice questions to evaluate in-depth learning of economics. Journal of Economics Education, 37, 48-57. doi: 10.3200/JECE.37.1.48-57.
Budescu, D. V., & Bar-Hillel, M. (1993). To guess or not to guess: A decision-theoretic view of formula scoring. Journal of Educational Measurement, 30(4), 277–291. doi: 10.1111/j.1745-3984.1993.tb00427.x
Budescu, D. V. (1979) Differential weighting of multiple-choice items. Educational Testing Service Princeton.
Burton, R. F. (2001). Quantifying the effects of chance in multiple choice and true/false tests: Question selection and guessing of answers. Assessment & Evaluation in Higher Education, 26(1), 41–50. doi: 10.1080/02602930020022273
Clark, D., & Linn, M. C. (2003). Designing for knowledge integration: The impact of instructional time. Journal of the Learning Sciences, 12, 451–493. doi: 10.1207/S15327809JLS1204_1
Choppin, B. H. (1988). Correction for guessing. In J. P. Keeves (Ed.), Educational research, methodology, and measurement: An international handbook (pp. 384–386). Pergamon Press.
DiBattista, D. & Kurzawa, L. (2011). Examination of the quality of multiple-choice items on classroom tests. The Canadian Journal for the Scholarship of Teaching and Learning, 2(2). doi: 10.5206/cjsotl-rcacea.2011.2.4
Donlon, T.F. & Fitzpatrick, A.R. (1978) The statistical structure of multiple choice items. In Proceedings of the Annual Meeting of the Northeastern Educational Research Association, Oct. 1978, Ellenville, New York.
Echternacht, G. (1976) Reliability and validity of item option weighting schemes. Educational and Pyschological Measurement, 36, 301-309. doi: 10.1177/001316447603600208
Gözen, G. (2006). Kısa cevaplı ve çoktan seçmeli maddelerin “0-1” ve ağırlıklı puanlama yöntemleri ile puanlanmasının testin psikometrik özellikleri açısından incelenmesi [Analysis of short–answered and multiplechoice items via “1-0” and weighted scoring methods according to pyschometric characteristics of tests]. Educational Science & Practice, 5(9), 35-52
Frary, R. (1989) Partial credit scoring methods for multiple choice test. Applied Measurement in Education, 2(1), 79-96. doi: 10.1207/s15324818ame0201_5
Hendrickson, G., (1971) The effect of differential option Weighting on multiple choice objective test items. Report Number 93, The John Hopkins University.
Heubert, J. P., & Hauser, P. M. (1999). High-stakes testing for tracking, promotion, and graduation. National Academy Press.
Kubinger, K. D., Holocher-Ertl, S., Reif, M., Hohensinn, C., & Frebort, M. (2010). On minimizing guessing effects on multiple-choice items: Superiority of a two solutions and three distractors item format to a one solution and five distractors item format. International Journal of Selection and Assessment, 18(1), 111–115. doi: 10.1111/j.1468-2389.2010.00493.x
Prihoda, T. J., Pinckard, R. N., McMahan, C. A., & Jones, A. C. (2006). Correcting for guessing increases validity in multiple-choice examinations in an oral and maxillofacial pathology course. Journal of Dental Education, 70(4), 378-386. doi: 10.1002/j.0022-0337.2006.70.4.tb04092.x
Jaradat, D. & Tollefson, N. (1997) The impact of alternative scoring procedure for multiple choice items on test reliability, validity and grading. Educational and Pyschlogical Measurement, 48, 627-635. doi: 10.1177/0013164488483006
Mavis, B. E., Cole, B. L., & Hoppe, R. B. (2001). A survey of student assessment in U.S. medical schools: The balance of breadth versus fidelity. Teaching and Learning in Medicine, 13, 74-79. doi: 10.1207/S15328015TLM1302_1
McDougall, D. (1997). College faculty’s use of objective tests: State-of-the-practice versus state-of-the-art. Journal of Research and Development in Education, 30, 183–93.
Merwin, J. (1959) Rational and mathematical relationships of six scoring procedures applicable to three-choice items. Journal of Educational Psychology, 50(4). doi: 10.1037/h0045073
Özdemir, D. (2003). Çoktan seçmeli testleri puanlama yöntemlerine bir bakış [An overview of methods for scoring multiple choice tests]. Eğitim Araştırmaları Dergisi, 4(12),121-122
Özdemir, D. (2004) Çoktan seçmeli testlerin klasik test teorisi ve örtük özellikler teorisine göre hesaplanan psikometrik özelliklerinin iki kategorili ve ağırlıklandırılmış puanlaması yönünden karşılaştırılması [A comparison of psychometric characteristics of multiple choice tests based on the binarys and weighted scoring in respect to classical test and latent trait theory]. Hacettepe University Journal of Education, 26, 117-123
Palmer, E.J. & Dewitt,P.G. (2007) Assessment of higher order cognitive skills in undergraduate education: modified assey or multiple choice questions? BMC Medical Education, 20, 129-158. doi: 10.1186/1472-6920-7-49
Ramsay, J.O. (1968) A scoring system for multiple choice test items. The British Journal of Mathematical and Statistical Psychology, 41, 249-262. doi: 10.1111/j.2044-8317.1968.tb00413.x
Reilly R.R., & Jackson,R. (1972). Effects of empirical option weighting on reliability and validity of GRE. Journal of Educational Measurement, 10(3), 185-193. doi: 10.1111/j.1745-3984.1973.tb00796.x
Roediger, H. L., III, & Marsh, E. J. (2005). The positive and negative consequences of multiple-choice testing. Journal of Experimental Psychology: Learning, Memory, and Cognition, 31, 1155–1159. doi: 10.1037/0278-7393.31.5.1155
Rowley, G.L., & Traub, R.e (1977) Formula scoring, number–right scoring, and test-taking strategy. Journal of Educational Measurement, 14, 15-22.
Sax, G. (1989) Principle of educational and pscyhological measurement and evaluation. Wadsworth. Shepard, L. A. (2000). The role of assessment in a learning culture. Educational Researcher, 29, 4–14. doi: 10.3102/0013189X029007004
Kurz. T. B. (1999). A review of scoring algorithms for multiple choice tests. EDRS Publications, Report NO: ED 428 076
Walsh, C.M. & Seldomridge, L.A. (2006). Critical thinking: Back to square two. Nursing Education, 45, 212-219. doi: 10.3928/01484834-20060601-05
Weitzman, R.A: (1970) Ideal multiple choice items. Journal of The American Statistical Association, 65(329), 71-89. doi: 10.1080/01621459.1970.10481063
Wilson, M., & Wang, W. C. (1995). Complex composites: Issues that arise in combining different modes of assessment. Applied Psychological Measurement, 19, 51–71. doi: 10.1177/014662169501900107
Yurdugül, H. (2010) Farklı madde puanlama yöntemlerinin ve test puanlama yöntemlerinin karşılaştırılması [Different item scoring methods and different test scoring comparison of methods]. Journal of Measurement and Evaluation in Education and Psychology, 1(1) 1-8.

There are 37 citations in total.

Details

Primary Language	English
Subjects	Other Fields of Education
Journal Section	Articles
Authors	Metin Yaşar 0000-0002-7854-1494 Seval Kartal 0000-0002-3018-6972 Eren Can Aybek 0000-0003-3040-2337
Publication Date	June 5, 2021
Published in Issue	Year 2021 Volume: 10 Issue: 2

Cite

APA	Yaşar, M., Kartal, S., & Aybek, E. C. (2021). Scoring Methods for Multiple Choice Tests: How does the Item Difficulty Weighted Scoring Change Student’s Test Results?. Bartın University Journal of Faculty of Education, 10(2), 309-324. https://doi.org/10.14686/buefad.878504

Cited By

Effects of a teacher development program on teachers' knowledge and collaborative engagement, and students' achievement in computational thinking concepts

British Journal of Educational Technology

https://doi.org/10.1111/bjet.13256

Download Cover Image

Article Files

Full Text

All the articles published in the journal are open access and distributed under the conditions of CommonsAttribution-NonCommercial 4.0 International License

Bartın University Journal of Faculty of Education