Research Article
BibTex RIS Cite

Scoring Methods for Multiple Choice Tests: How does the Item Difficulty Weighted Scoring Change Student’s Test Results?

Year 2021, Volume: 10 Issue: 2, 309 - 324, 05.06.2021
https://doi.org/10.14686/buefad.878504

Abstract

In the present study, it was aimed to compare students’ test scores, item and test statistics calculated based on the unweighted (1-0) and item difficulty weighted scores (Qj - 0). The study also included a proposal for converting the weighted scores into a 100-point scale system. A teacher-made 34-item multiple-choice achievement test was conducted to a group of 431 people. As a result of the data analysis, the McDonald's Omega internal consistency coefficients that were obtained according to the 1-0 and (Q_j-0) methods were obtained as .725 and .721, respectively. The Pearson’s product moment correlation coefficient was .916, and the Spearman’s rank-order correlation coefficient was .926 between student scores obtained according to the two methods. Furthermore, a criterion-based evaluation was made based on the two criteria (test scores of 50 and 60), and the numbers of the students who were successful and unsuccessful in the course were determined according to both scoring methods. Accordingly, it was found that more students would be considered unsuccessful in the course in the (Q_j-0) scoring method; however, it was understood that this method could reveal differences among individuals more than the unweighted scoring method.

References

  • Akkuş, O. & Baykul, Y. (2001) Çoktan seçmeli test maddelerinin puanlamada, seçenekleri farklı biçimlerde ağırlıklandırmanın madde ve test istatistiklerine olan etkisinin incelenmesi [An investigation on the effects of different item-option scoring methods on item and test parameters ]. Hacettepe University Journal of Education, 20, 9-15.
  • Bacon, D. R. (2003). Assessing learning outcomes: A comparison of multiple-choice and short answer questions in a marketing context. Journal of Marketing Education, 25, 31-36. doi: 10.1177/0273475302250570
  • Bar-Hillel, M., Budescu, D., & Attali, Y. (2005). Scoring and keying multiple choice tests: A case study in irrationality. Mind and Society, 4, 3-12. doi: 10.1007/s11299-005-0001-z
  • Bejar, I., & Weiss, D.J., (1977) A comparison of empricial differential of inter-item correlation. Educational and Pyschological Measurement. 37, 335-340. doi: 10.1177/001316447703700207
  • Bereby-Meyer, Y., Meyer, Y., & Flascher, O. M. (2002). Prospect theory analysis of guessing in multiple choice tests. Journal of Behavioral Decision Making, 15, 313–327. doi: 10.1002/bdm.417
  • Buckles, S., & Siegfried, J.J., (2006). Using in-depth multiple-choice questions to evaluate in-depth learning of economics. Journal of Economics Education, 37, 48-57. doi: 10.3200/JECE.37.1.48-57.
  • Budescu, D. V., & Bar-Hillel, M. (1993). To guess or not to guess: A decision-theoretic view of formula scoring. Journal of Educational Measurement, 30(4), 277–291. doi: 10.1111/j.1745-3984.1993.tb00427.x
  • Budescu, D. V. (1979) Differential weighting of multiple-choice items. Educational Testing Service Princeton.
  • Burton, R. F. (2001). Quantifying the effects of chance in multiple choice and true/false tests: Question selection and guessing of answers. Assessment & Evaluation in Higher Education, 26(1), 41–50. doi: 10.1080/02602930020022273
  • Clark, D., & Linn, M. C. (2003). Designing for knowledge integration: The impact of instructional time. Journal of the Learning Sciences, 12, 451–493. doi: 10.1207/S15327809JLS1204_1
  • Choppin, B. H. (1988). Correction for guessing. In J. P. Keeves (Ed.), Educational research, methodology, and measurement: An international handbook (pp. 384–386). Pergamon Press.
  • DiBattista, D. & Kurzawa, L. (2011). Examination of the quality of multiple-choice items on classroom tests. The Canadian Journal for the Scholarship of Teaching and Learning, 2(2). doi: 10.5206/cjsotl-rcacea.2011.2.4
  • Donlon, T.F. & Fitzpatrick, A.R. (1978) The statistical structure of multiple choice items. In Proceedings of the Annual Meeting of the Northeastern Educational Research Association, Oct. 1978, Ellenville, New York.
  • Echternacht, G. (1976) Reliability and validity of item option weighting schemes. Educational and Pyschological Measurement, 36, 301-309. doi: 10.1177/001316447603600208
  • Gözen, G. (2006). Kısa cevaplı ve çoktan seçmeli maddelerin “0-1” ve ağırlıklı puanlama yöntemleri ile puanlanmasının testin psikometrik özellikleri açısından incelenmesi [Analysis of short–answered and multiplechoice items via “1-0” and weighted scoring methods according to pyschometric characteristics of tests]. Educational Science & Practice, 5(9), 35-52
  • Frary, R. (1989) Partial credit scoring methods for multiple choice test. Applied Measurement in Education, 2(1), 79-96. doi: 10.1207/s15324818ame0201_5
  • Hendrickson, G., (1971) The effect of differential option Weighting on multiple choice objective test items. Report Number 93, The John Hopkins University.
  • Heubert, J. P., & Hauser, P. M. (1999). High-stakes testing for tracking, promotion, and graduation. National Academy Press.
  • Kubinger, K. D., Holocher-Ertl, S., Reif, M., Hohensinn, C., & Frebort, M. (2010). On minimizing guessing effects on multiple-choice items: Superiority of a two solutions and three distractors item format to a one solution and five distractors item format. International Journal of Selection and Assessment, 18(1), 111–115. doi: 10.1111/j.1468-2389.2010.00493.x
  • Prihoda, T. J., Pinckard, R. N., McMahan, C. A., & Jones, A. C. (2006). Correcting for guessing increases validity in multiple-choice examinations in an oral and maxillofacial pathology course. Journal of Dental Education, 70(4), 378-386. doi: 10.1002/j.0022-0337.2006.70.4.tb04092.x
  • Jaradat, D. & Tollefson, N. (1997) The impact of alternative scoring procedure for multiple choice items on test reliability, validity and grading. Educational and Pyschlogical Measurement, 48, 627-635. doi: 10.1177/0013164488483006
  • Mavis, B. E., Cole, B. L., & Hoppe, R. B. (2001). A survey of student assessment in U.S. medical schools: The balance of breadth versus fidelity. Teaching and Learning in Medicine, 13, 74-79. doi: 10.1207/S15328015TLM1302_1
  • McDougall, D. (1997). College faculty’s use of objective tests: State-of-the-practice versus state-of-the-art. Journal of Research and Development in Education, 30, 183–93.
  • Merwin, J. (1959) Rational and mathematical relationships of six scoring procedures applicable to three-choice items. Journal of Educational Psychology, 50(4). doi: 10.1037/h0045073
  • Özdemir, D. (2003). Çoktan seçmeli testleri puanlama yöntemlerine bir bakış [An overview of methods for scoring multiple choice tests]. Eğitim Araştırmaları Dergisi, 4(12),121-122
  • Özdemir, D. (2004) Çoktan seçmeli testlerin klasik test teorisi ve örtük özellikler teorisine göre hesaplanan psikometrik özelliklerinin iki kategorili ve ağırlıklandırılmış puanlaması yönünden karşılaştırılması [A comparison of psychometric characteristics of multiple choice tests based on the binarys and weighted scoring in respect to classical test and latent trait theory]. Hacettepe University Journal of Education, 26, 117-123
  • Palmer, E.J. & Dewitt,P.G. (2007) Assessment of higher order cognitive skills in undergraduate education: modified assey or multiple choice questions? BMC Medical Education, 20, 129-158. doi: 10.1186/1472-6920-7-49
  • Ramsay, J.O. (1968) A scoring system for multiple choice test items. The British Journal of Mathematical and Statistical Psychology, 41, 249-262. doi: 10.1111/j.2044-8317.1968.tb00413.x
  • Reilly R.R., & Jackson,R. (1972). Effects of empirical option weighting on reliability and validity of GRE. Journal of Educational Measurement, 10(3), 185-193. doi: 10.1111/j.1745-3984.1973.tb00796.x
  • Roediger, H. L., III, & Marsh, E. J. (2005). The positive and negative consequences of multiple-choice testing. Journal of Experimental Psychology: Learning, Memory, and Cognition, 31, 1155–1159. doi: 10.1037/0278-7393.31.5.1155
  • Rowley, G.L., & Traub, R.e (1977) Formula scoring, number–right scoring, and test-taking strategy. Journal of Educational Measurement, 14, 15-22.
  • Sax, G. (1989) Principle of educational and pscyhological measurement and evaluation. Wadsworth. Shepard, L. A. (2000). The role of assessment in a learning culture. Educational Researcher, 29, 4–14. doi: 10.3102/0013189X029007004
  • Kurz. T. B. (1999). A review of scoring algorithms for multiple choice tests. EDRS Publications, Report NO: ED 428 076
  • Walsh, C.M. & Seldomridge, L.A. (2006). Critical thinking: Back to square two. Nursing Education, 45, 212-219. doi: 10.3928/01484834-20060601-05
  • Weitzman, R.A: (1970) Ideal multiple choice items. Journal of The American Statistical Association, 65(329), 71-89. doi: 10.1080/01621459.1970.10481063
  • Wilson, M., & Wang, W. C. (1995). Complex composites: Issues that arise in combining different modes of assessment. Applied Psychological Measurement, 19, 51–71. doi: 10.1177/014662169501900107
  • Yurdugül, H. (2010) Farklı madde puanlama yöntemlerinin ve test puanlama yöntemlerinin karşılaştırılması [Different item scoring methods and different test scoring comparison of methods]. Journal of Measurement and Evaluation in Education and Psychology, 1(1) 1-8.

Çoktan Seçmeli Testlerde Puanlama Yöntemleri: Madde Güçlüğüne Dayalı Ağırlıklandırma Öğrencilerin Test Sonuçlarını Nasıl Değiştirir?

Year 2021, Volume: 10 Issue: 2, 309 - 324, 05.06.2021
https://doi.org/10.14686/buefad.878504

Abstract

Bu çalışmada öğretmen yapımı çoktan seçmeli 34 maddelik bir başarı testinin 431 kişilik bir gruba uygulanmıştır. Daha sonra ağırlıklandırılmamış (1 - 0) ve madde güçlüğüne göre (Qj - 0) ağırlıklandırılmış puanlara göre madde ve test istatistiklerinin, öğrencilerin dersten geçme ve kalma durumlarının karşılaştırılması amaçlanmıştır. Aynı zamanda ağırlıklandırılmış puanların 100’lük puan sistemine çevrilmesine yönelik bir öneri de sunulmuştur. Veri analizi sonucunda 1 - 0 ve Q - 0 yöntemlerine göre elde edilen McDonald’s Omega iç tutarlık katsayıları sırasıyla .725 ve .721 olarak elde edilmiştir. İki yönteme göre elde edilen öğrenci puanları arasında ise Pearson momentler çarpım korelasyon katsayısı .916 ve Spearman sıra farkları korelasyon katsayısı .926 olarak bulunmuştur. Aynı zamanda sırasıyla 50 ve 60 puana göre ölçüt dayanaklı bir değerlendirme yapıldığında, her iki yönteme göre dersten başarılı ve başarısız sayılan öğrenci sayıları belirlenmiştir. Buna göre Qj – 0 puanlama yöntemine göre daha çok öğrencinin dersten başarısız sayılacağı bulunmuş, ancak buna karşın bu yöntemin bireyler arasındaki farklılıkları daha iyi ortaya koyabileceği anlaşılmıştır.

References

  • Akkuş, O. & Baykul, Y. (2001) Çoktan seçmeli test maddelerinin puanlamada, seçenekleri farklı biçimlerde ağırlıklandırmanın madde ve test istatistiklerine olan etkisinin incelenmesi [An investigation on the effects of different item-option scoring methods on item and test parameters ]. Hacettepe University Journal of Education, 20, 9-15.
  • Bacon, D. R. (2003). Assessing learning outcomes: A comparison of multiple-choice and short answer questions in a marketing context. Journal of Marketing Education, 25, 31-36. doi: 10.1177/0273475302250570
  • Bar-Hillel, M., Budescu, D., & Attali, Y. (2005). Scoring and keying multiple choice tests: A case study in irrationality. Mind and Society, 4, 3-12. doi: 10.1007/s11299-005-0001-z
  • Bejar, I., & Weiss, D.J., (1977) A comparison of empricial differential of inter-item correlation. Educational and Pyschological Measurement. 37, 335-340. doi: 10.1177/001316447703700207
  • Bereby-Meyer, Y., Meyer, Y., & Flascher, O. M. (2002). Prospect theory analysis of guessing in multiple choice tests. Journal of Behavioral Decision Making, 15, 313–327. doi: 10.1002/bdm.417
  • Buckles, S., & Siegfried, J.J., (2006). Using in-depth multiple-choice questions to evaluate in-depth learning of economics. Journal of Economics Education, 37, 48-57. doi: 10.3200/JECE.37.1.48-57.
  • Budescu, D. V., & Bar-Hillel, M. (1993). To guess or not to guess: A decision-theoretic view of formula scoring. Journal of Educational Measurement, 30(4), 277–291. doi: 10.1111/j.1745-3984.1993.tb00427.x
  • Budescu, D. V. (1979) Differential weighting of multiple-choice items. Educational Testing Service Princeton.
  • Burton, R. F. (2001). Quantifying the effects of chance in multiple choice and true/false tests: Question selection and guessing of answers. Assessment & Evaluation in Higher Education, 26(1), 41–50. doi: 10.1080/02602930020022273
  • Clark, D., & Linn, M. C. (2003). Designing for knowledge integration: The impact of instructional time. Journal of the Learning Sciences, 12, 451–493. doi: 10.1207/S15327809JLS1204_1
  • Choppin, B. H. (1988). Correction for guessing. In J. P. Keeves (Ed.), Educational research, methodology, and measurement: An international handbook (pp. 384–386). Pergamon Press.
  • DiBattista, D. & Kurzawa, L. (2011). Examination of the quality of multiple-choice items on classroom tests. The Canadian Journal for the Scholarship of Teaching and Learning, 2(2). doi: 10.5206/cjsotl-rcacea.2011.2.4
  • Donlon, T.F. & Fitzpatrick, A.R. (1978) The statistical structure of multiple choice items. In Proceedings of the Annual Meeting of the Northeastern Educational Research Association, Oct. 1978, Ellenville, New York.
  • Echternacht, G. (1976) Reliability and validity of item option weighting schemes. Educational and Pyschological Measurement, 36, 301-309. doi: 10.1177/001316447603600208
  • Gözen, G. (2006). Kısa cevaplı ve çoktan seçmeli maddelerin “0-1” ve ağırlıklı puanlama yöntemleri ile puanlanmasının testin psikometrik özellikleri açısından incelenmesi [Analysis of short–answered and multiplechoice items via “1-0” and weighted scoring methods according to pyschometric characteristics of tests]. Educational Science & Practice, 5(9), 35-52
  • Frary, R. (1989) Partial credit scoring methods for multiple choice test. Applied Measurement in Education, 2(1), 79-96. doi: 10.1207/s15324818ame0201_5
  • Hendrickson, G., (1971) The effect of differential option Weighting on multiple choice objective test items. Report Number 93, The John Hopkins University.
  • Heubert, J. P., & Hauser, P. M. (1999). High-stakes testing for tracking, promotion, and graduation. National Academy Press.
  • Kubinger, K. D., Holocher-Ertl, S., Reif, M., Hohensinn, C., & Frebort, M. (2010). On minimizing guessing effects on multiple-choice items: Superiority of a two solutions and three distractors item format to a one solution and five distractors item format. International Journal of Selection and Assessment, 18(1), 111–115. doi: 10.1111/j.1468-2389.2010.00493.x
  • Prihoda, T. J., Pinckard, R. N., McMahan, C. A., & Jones, A. C. (2006). Correcting for guessing increases validity in multiple-choice examinations in an oral and maxillofacial pathology course. Journal of Dental Education, 70(4), 378-386. doi: 10.1002/j.0022-0337.2006.70.4.tb04092.x
  • Jaradat, D. & Tollefson, N. (1997) The impact of alternative scoring procedure for multiple choice items on test reliability, validity and grading. Educational and Pyschlogical Measurement, 48, 627-635. doi: 10.1177/0013164488483006
  • Mavis, B. E., Cole, B. L., & Hoppe, R. B. (2001). A survey of student assessment in U.S. medical schools: The balance of breadth versus fidelity. Teaching and Learning in Medicine, 13, 74-79. doi: 10.1207/S15328015TLM1302_1
  • McDougall, D. (1997). College faculty’s use of objective tests: State-of-the-practice versus state-of-the-art. Journal of Research and Development in Education, 30, 183–93.
  • Merwin, J. (1959) Rational and mathematical relationships of six scoring procedures applicable to three-choice items. Journal of Educational Psychology, 50(4). doi: 10.1037/h0045073
  • Özdemir, D. (2003). Çoktan seçmeli testleri puanlama yöntemlerine bir bakış [An overview of methods for scoring multiple choice tests]. Eğitim Araştırmaları Dergisi, 4(12),121-122
  • Özdemir, D. (2004) Çoktan seçmeli testlerin klasik test teorisi ve örtük özellikler teorisine göre hesaplanan psikometrik özelliklerinin iki kategorili ve ağırlıklandırılmış puanlaması yönünden karşılaştırılması [A comparison of psychometric characteristics of multiple choice tests based on the binarys and weighted scoring in respect to classical test and latent trait theory]. Hacettepe University Journal of Education, 26, 117-123
  • Palmer, E.J. & Dewitt,P.G. (2007) Assessment of higher order cognitive skills in undergraduate education: modified assey or multiple choice questions? BMC Medical Education, 20, 129-158. doi: 10.1186/1472-6920-7-49
  • Ramsay, J.O. (1968) A scoring system for multiple choice test items. The British Journal of Mathematical and Statistical Psychology, 41, 249-262. doi: 10.1111/j.2044-8317.1968.tb00413.x
  • Reilly R.R., & Jackson,R. (1972). Effects of empirical option weighting on reliability and validity of GRE. Journal of Educational Measurement, 10(3), 185-193. doi: 10.1111/j.1745-3984.1973.tb00796.x
  • Roediger, H. L., III, & Marsh, E. J. (2005). The positive and negative consequences of multiple-choice testing. Journal of Experimental Psychology: Learning, Memory, and Cognition, 31, 1155–1159. doi: 10.1037/0278-7393.31.5.1155
  • Rowley, G.L., & Traub, R.e (1977) Formula scoring, number–right scoring, and test-taking strategy. Journal of Educational Measurement, 14, 15-22.
  • Sax, G. (1989) Principle of educational and pscyhological measurement and evaluation. Wadsworth. Shepard, L. A. (2000). The role of assessment in a learning culture. Educational Researcher, 29, 4–14. doi: 10.3102/0013189X029007004
  • Kurz. T. B. (1999). A review of scoring algorithms for multiple choice tests. EDRS Publications, Report NO: ED 428 076
  • Walsh, C.M. & Seldomridge, L.A. (2006). Critical thinking: Back to square two. Nursing Education, 45, 212-219. doi: 10.3928/01484834-20060601-05
  • Weitzman, R.A: (1970) Ideal multiple choice items. Journal of The American Statistical Association, 65(329), 71-89. doi: 10.1080/01621459.1970.10481063
  • Wilson, M., & Wang, W. C. (1995). Complex composites: Issues that arise in combining different modes of assessment. Applied Psychological Measurement, 19, 51–71. doi: 10.1177/014662169501900107
  • Yurdugül, H. (2010) Farklı madde puanlama yöntemlerinin ve test puanlama yöntemlerinin karşılaştırılması [Different item scoring methods and different test scoring comparison of methods]. Journal of Measurement and Evaluation in Education and Psychology, 1(1) 1-8.
There are 37 citations in total.

Details

Primary Language English
Subjects Other Fields of Education
Journal Section Articles
Authors

Metin Yaşar 0000-0002-7854-1494

Seval Kartal 0000-0002-3018-6972

Eren Can Aybek 0000-0003-3040-2337

Publication Date June 5, 2021
Published in Issue Year 2021 Volume: 10 Issue: 2

Cite

APA Yaşar, M., Kartal, S., & Aybek, E. C. (2021). Scoring Methods for Multiple Choice Tests: How does the Item Difficulty Weighted Scoring Change Student’s Test Results?. Bartın University Journal of Faculty of Education, 10(2), 309-324. https://doi.org/10.14686/buefad.878504

All the articles published in the journal are open access and distributed under the conditions of CommonsAttribution-NonCommercial 4.0 International License 

88x31.png


Bartın University Journal of Faculty of Education