Examining the Achievement Test Development Process in the Educational Studies

Melek Gülşah Şahin; Yıldız Yıldırım; Nagihan Boztunc Öztürk

doi:10.17275/per.23.14.10.1

Araştırma Makalesi

Yıl 2023, Cilt: 10 Sayı: 1, 251 - 274, 30.01.2023

Melek Gülşah Şahin Yıldız Yıldırım Nagihan Boztunc Öztürk

https://doi.org/10.17275/per.23.14.10.1

Cited By: 2

Öz

Kaynakça

Acar-Güvendir, M., & Özer-Özkan, Y. (2015). The examination of scale development and scale adaptation articles published in Turkish academic journals on education. Electronic Journal of Social Sciences, 14(52), 23-33. doi: 10.17755/esosder.54872
AERA, APA, & NCME. (2014). Standarts for educational and psychological testing. Washington, DC: American Educational Research Association.
Boyraz, C. (2018). Investigation of achievement tests used in doctoral dissertations department of primary education (2012-2017). Inonu University Journal of the Faculty of Education, 19(3), 14-28. doi: 10.17679/inuefd.327321
Boztunç-Öztürk, N. B., Eroğlu, M. G., & Kelecioğlu, H. (2015). A review of articles concerning scale adaptation in the field of education. Education and Science, 40(178), 123-137. doi: 10.15390/EB.2015.4091
Brookhart, S. M. (2018). Appropriate criteria: Key to effective rubrics. Frontiers in Education, 3(22), 1-12. doi: 10.3389/feduc.2018.00022.
Büyükkıdık, S. (2012). Comparison of interrater reliability based on the classical test theory and generalizability theory in problem solving skills assessment. (Published master thesis). Hacettepe University, Ankara.
Crocker, L., & Algina, J. (2006). Introduction to classical and modern test theory. Ohio, Maison: Cengage Learning.
Cronbach, L. J. (1990). Essentials of psychological testing (5. ed.). New York, NY: Harper & Row Publishers Inc.
Çelen, Ü. (2008). Comparison of validity and reliability of two tests developed by classical test theory and item response theory. Elementary Education Online, 7(3), 758-768. Retrieved from https://dergipark.org.tr/en/download/article-file/90935
Çelen, Ü., & Aybek, E. C. (2013). Öğrenci başarısının öğretmen yapımı bir testle klasik test kuramı ve madde tepki kuramı yöntemleriyle elde edilen puanlara göre karşılaştırılması. Journal of Measurement and Evaluation in Education and Psychology, 4(2), 64-75. Retrieved from https://dergipark.org.tr/en/download/article-file/65958
Çetin, B. (2019). Test geliştirme. B. Çetin (Ed.). In Eğitimde ölçme ve değerlendirme [Measurement and assessment in education] (p. 105-126). Ankara: Anı Publishing.
Çüm, S., & Koç, N. (2013). The review of scale development and adaptation studies which have been published in psychology and education journals in Turkey. Journal of Educational Sciences & Practices, 12(24), 115-135. Retrieved from https://www.idealonline.com.tr/IdealOnline/pdfViewer/index.xhtml?uId=5928&ioM=Paper&preview=true&isViewer=true#pagemode=bookmarks
de Ayala, R. J. (2009). The theory and practice of item response theory. New York, NY: The Guilford Press.
Delice, A., & Ergene, Ö. (2015). Investigation of scale development and adaptation studies: An example of mathematics education articles. Karaelmas Journal of Educational Sciences, 3(1), 60-75. Retrieved from https://dergipark.org.tr/tr/pub/kebd/issue/67216/1049114
DeMars, C. (2010). Item response theory. New York: Oxford University Press.
Doğan, N., & Kılıç, A. F. (2017). Madde tepki kuramı yetenek ve madde parametre kestirimlerinin değişmezliğinin incelenmesi. Ö. Demirel and S. Dinçer (Eds.). In Küreselleşen dünyada eğitim [Education in a globalizing world] (p. 298-314). Ankara: Pegem Academy. doi: 10.14527/9786053188407.21
Downing, S. M., & Haladyna, T. M. (2011). Handbook of test development. New Jersey, NJ: Lawrence Erlbaum Associates Publishers.
Enago (2021). Why is a pilot study important in research?. Retrieved from https://www.enago.com/academy/pilot-study-defines-a-good-research-design/
Ergene, Ö. (2020). Scale development and adaptation articles in the field of mathematics education: Descriptive content analysis. Journal of Education for Life, 34(2), 360-383. doi:10.33308/26674874.2020342207
Evrekli, E., İnel, D. , Deniş, H., & Balım, A. G. (2011). Methodological and statistical problems in graduate theses in the field of science education. Elementary Education Online, 10(1), 206-218. Retrieved from https://dergipark.org.tr/tr/pub/ilkonline/issue/8593/106858
Haladyna, T. M. (2004). Developing and validating multiple-choice test items (3. ed.). New Jersey, NJ: Lawrence Erlbaum Associates Publishers.
Goodrich Andrade, H. (2000). Using rubrics to promote thinking and learning. Educational Leadership, 57(5), 13-18. Retrieved from https://eric.ed.gov/?id=EJ609600
Goodrich Andrade, H. (2001). The effects of instructional rubrics on learning to write. Current Issues in Education, 4(4), 1-22. Retrieved from https://cie.asu.edu/ojs/index.php/cieatasu/article/view/1630
Goodrich Andrade, H. (2005). Teaching with rubrics: The good, the bad, and the ugly. College Teaching, 53(1), 27-31. doi: 10.3200/CTCH.53.1.27-31
Hambleton, R. K., & Swaminathan, H. (1985). Item response theory. Principles and Applications. Dordrecht, The Netherlands: Kluwer-Nijhoff Publishing Co.
Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory (Vol. 2). California, CA: Sage.
Hunter, D. M., Jones, R. M., & Randhawa, B. S. (1996). The use of holistic versus analytic scoring for large-scale assessment of writing. The Canadian Journal of Program Evaluation, 11(2), 61-85. Retrieved from https://www.evaluationcanada.ca/secure/11-2-061.pdf
Jonsson, A., & Svingby, G. (2007). The use of scoring rubrics: Reliability, validity and educational consequences. Educational Research Review, 2(2), 130-144. doi: 10.1016/j.edurev.2007.05.002
Karadağ, E. (2011). Instruments used in doctoral dissertations in educational sciences in Turkey: Quality of research and analytical errors. Educational Sciences: Theory & Practice, 11(1), 311-334. Retrieved from https://silo.tips/download/eitim-bilimleri-doktora-tezlerinde-kullanlan-lme-aralar-nitelik-dzeyleri-ve-anal
Lane, S., Raymond, M. R., & Haladyna, T. M. (2016). Handbook of test development (2. ed.). New York, NY: Routledge.
Lord, F.M., & Novick, M.R. (1968). Statistical theories of mental test scores. Menlo Park, CA: Addison-Wesley.
Mertler, C.A. (2000). Designing scoring rubrics for your classroom. Practical Assessment, Research, and Evaluation, 7(25), 1-8. doi: 10.7275/gcy8-0w24
Messick, S. (1995). Validity of psychological assessment: Validation of inferences from persons’ responses and performances as scientific inquiry into score meaning. American Psychologist, 50, 741-749. doi:10.1037/0003-066x.50.9.741
Miles, M. B., & Huberman, A. M. (1994). Qualitative data analysis: An expanded sourcebook (2. ed.). Thousand Oaks, CA: Sage.
Mor-Dirlik, E. (2014). Ölçek geliştirme konulu doktora tezlerinin test ve ölçek geliştirme standartlarına uygunluğunun incelenmesi. Eğitimde ve Psikolojide Ölçme ve Değerlendirme Dergisi, 5(2), 62-78. doi: 10.21031/epod.63138
Mor Dirlik, E. (2021). Farklı test kuramlarından hesaplanan madde ayırt edicilik parametrelerinin karşılaştırılması. Trakya Eğitim Dergisi. 11(2), 732-744. doi: 10.24315/tred.700445
Moskal, B. M. (2000). Scoring rubrics: What, when and how?. Practical Assessment, Research, and Evaluation, 7(3), 1-5. Doi: 10.7275/a5vq-7q66
Moskal, B. M., & Leydens, J. A. (2000). Scoring rubric development: validity and reliability. Practical Assessment, Research, and Evaluation, 7(4), 1-22. doi: 10.7275/q7rm-gg74
Mutluer, C., & Yandı, A. (2012, September). Türkiye’deki üniversitelerde 2010-2012 yılları arasında yayımlanan tezlerdeki başarı testlerin incelenmesi. Paper presented at the Eğitimde ve Psikolojide Ölçme ve Değerlendirme III. Ulusal Kongresi, Turkey: Bolu. Abstract retrieved from https://www.epodder.org/wp-content/uploads/2020/07/EPOD-2012.pdf
Olgun, G., & Alatlı, B. (2021). The review of scale development and adaptation studies published for adolescents in Turkey. The Journal of Turkish Educational Sciences, 19(1), 568-592. doi: 10.37217/tebd.849954
Öksüzoğlu, M. (2022). The investigation of items measuring high-level thinking skills in terms of student score and score reliability. (Unpublished master thesis). Hacettepe University, Ankara.
Özçelik, D. A. (1992). Ölçme ve değerlendirme [Measurement and assessment]. Ankara: ÖSYM Publ.
Reznitskaya, A., Kuo, L., Glina, M., & Anderson, R. C. (2009). Measuring argumentative reasoning: What’s behind the numbers?. Learning and Individual Differences, 19(2), 219–224. doi:10.1016/j.lindif.2008.11.001.
Şanlı, E. (2010). Comparing reliability levels of scoring of the holistic and analytic rubrics in evaluating the scientific process skills. (Unpublished master thesis). Ankara University, Ankara.
Şahin, M. G. (2019). Performansa dayalı değerlendirme. B. Çetin (Ed.). In Eğitimde ölçme ve değerlendirme [Measurement and assessment in education] (p. 213-264). Ankara: Anı Publ.
Şahin, M. G., & Boztunç-Öztürk, N. (2018). Scale development process in educational field: A content analysis research. Kastamonu Education Journal, 26(1), 191-199. doi: 10.24106/kefdergi.375863
Tindal, G., & Haladyna, T. M. (2012). Large-scale assessment programs for all students: Validity, technical adequacy, and implementation. Mahwah, New Jersey: Lawrence Erlbaum.
Turgut, F. (1992). Eğitimde ölçme ve değerlendirme [Measurement and assessment in education] (8. ed.). Ankara: Saydam Publ.
Yıldırım, A., & Şimşek, H. (2013). Sosyal Bilimlerde Nitel Araştırma Yöntemleri [Qulatitative Research Methods in Social Sciences] (9. ed.). Ankara: Seçkin Publ.
Yıldıztekin, B. (2014). The comparison of interrater reliability by using estimating tecniques in classical test theory and generalizability theory. (Unpublished master thesis). Hacettepe University, Ankara.

Examining the Achievement Test Development Process in the Educational Studies

Yıl 2023, Cilt: 10 Sayı: 1, 251 - 274, 30.01.2023

Melek Gülşah Şahin Yıldız Yıldırım Nagihan Boztunc Öztürk

https://doi.org/10.17275/per.23.14.10.1

Cited By: 2

Öz

Literature review shows that the development process of an achievement test is mainly investigated in dissertations. Moreover, preparing a form that will shed light on developing an achievement test is expected to guide those who will administer the test. In this line, the current study aims to create an “Achievement Test Development Process Control Form” and investigate the achievement tests for Maths based on this form. Document analysis was conducted within the framework of qualitative research and was done based on descriptive analysis. Within the scope of the research, 1683 articles published in designated journals between 2015-2020 were reviewed. It was determined that a mathematics achievement test was developed in 39 of these articles, which were coded on the control form. The articles that were included in the scope of the current study were investigated in terms of the type of items used in the tests, the theory or practice on which the test was developed, the use of rubric for open-ended items, the number of items in the pilot and final form, features of the test form as well as those pertaining to the table of specifications, the features of item pool, the evaluation of pilot testing, the evaluation of real study, test validity and reliability, and the setting in which tests were administered. The current study findings show that mostly an item pool was not prepared, the pilot application was not conducted or was not specified, and even if it was conducted, item analysis was not performed, test forms or example items were not included in the articles, and there were some deficiencies regarding validity. On the other hand, it was clear that the articles mostly specified the test goal and reported the reliability coefficient. In light of the current findings, some suggestions are provided for test developers and those who will administer these tests.

Anahtar Kelimeler

Achievement test, mathematics, assessment and evaluation, achievement test development, document analysis

Kaynakça

Acar-Güvendir, M., & Özer-Özkan, Y. (2015). The examination of scale development and scale adaptation articles published in Turkish academic journals on education. Electronic Journal of Social Sciences, 14(52), 23-33. doi: 10.17755/esosder.54872
AERA, APA, & NCME. (2014). Standarts for educational and psychological testing. Washington, DC: American Educational Research Association.
Boyraz, C. (2018). Investigation of achievement tests used in doctoral dissertations department of primary education (2012-2017). Inonu University Journal of the Faculty of Education, 19(3), 14-28. doi: 10.17679/inuefd.327321
Boztunç-Öztürk, N. B., Eroğlu, M. G., & Kelecioğlu, H. (2015). A review of articles concerning scale adaptation in the field of education. Education and Science, 40(178), 123-137. doi: 10.15390/EB.2015.4091
Brookhart, S. M. (2018). Appropriate criteria: Key to effective rubrics. Frontiers in Education, 3(22), 1-12. doi: 10.3389/feduc.2018.00022.
Büyükkıdık, S. (2012). Comparison of interrater reliability based on the classical test theory and generalizability theory in problem solving skills assessment. (Published master thesis). Hacettepe University, Ankara.
Crocker, L., & Algina, J. (2006). Introduction to classical and modern test theory. Ohio, Maison: Cengage Learning.
Cronbach, L. J. (1990). Essentials of psychological testing (5. ed.). New York, NY: Harper & Row Publishers Inc.
Çelen, Ü. (2008). Comparison of validity and reliability of two tests developed by classical test theory and item response theory. Elementary Education Online, 7(3), 758-768. Retrieved from https://dergipark.org.tr/en/download/article-file/90935
Çelen, Ü., & Aybek, E. C. (2013). Öğrenci başarısının öğretmen yapımı bir testle klasik test kuramı ve madde tepki kuramı yöntemleriyle elde edilen puanlara göre karşılaştırılması. Journal of Measurement and Evaluation in Education and Psychology, 4(2), 64-75. Retrieved from https://dergipark.org.tr/en/download/article-file/65958
Çetin, B. (2019). Test geliştirme. B. Çetin (Ed.). In Eğitimde ölçme ve değerlendirme [Measurement and assessment in education] (p. 105-126). Ankara: Anı Publishing.
Çüm, S., & Koç, N. (2013). The review of scale development and adaptation studies which have been published in psychology and education journals in Turkey. Journal of Educational Sciences & Practices, 12(24), 115-135. Retrieved from https://www.idealonline.com.tr/IdealOnline/pdfViewer/index.xhtml?uId=5928&ioM=Paper&preview=true&isViewer=true#pagemode=bookmarks
de Ayala, R. J. (2009). The theory and practice of item response theory. New York, NY: The Guilford Press.
Delice, A., & Ergene, Ö. (2015). Investigation of scale development and adaptation studies: An example of mathematics education articles. Karaelmas Journal of Educational Sciences, 3(1), 60-75. Retrieved from https://dergipark.org.tr/tr/pub/kebd/issue/67216/1049114
DeMars, C. (2010). Item response theory. New York: Oxford University Press.
Doğan, N., & Kılıç, A. F. (2017). Madde tepki kuramı yetenek ve madde parametre kestirimlerinin değişmezliğinin incelenmesi. Ö. Demirel and S. Dinçer (Eds.). In Küreselleşen dünyada eğitim [Education in a globalizing world] (p. 298-314). Ankara: Pegem Academy. doi: 10.14527/9786053188407.21
Downing, S. M., & Haladyna, T. M. (2011). Handbook of test development. New Jersey, NJ: Lawrence Erlbaum Associates Publishers.
Enago (2021). Why is a pilot study important in research?. Retrieved from https://www.enago.com/academy/pilot-study-defines-a-good-research-design/
Ergene, Ö. (2020). Scale development and adaptation articles in the field of mathematics education: Descriptive content analysis. Journal of Education for Life, 34(2), 360-383. doi:10.33308/26674874.2020342207
Evrekli, E., İnel, D. , Deniş, H., & Balım, A. G. (2011). Methodological and statistical problems in graduate theses in the field of science education. Elementary Education Online, 10(1), 206-218. Retrieved from https://dergipark.org.tr/tr/pub/ilkonline/issue/8593/106858
Haladyna, T. M. (2004). Developing and validating multiple-choice test items (3. ed.). New Jersey, NJ: Lawrence Erlbaum Associates Publishers.
Goodrich Andrade, H. (2000). Using rubrics to promote thinking and learning. Educational Leadership, 57(5), 13-18. Retrieved from https://eric.ed.gov/?id=EJ609600
Goodrich Andrade, H. (2001). The effects of instructional rubrics on learning to write. Current Issues in Education, 4(4), 1-22. Retrieved from https://cie.asu.edu/ojs/index.php/cieatasu/article/view/1630
Goodrich Andrade, H. (2005). Teaching with rubrics: The good, the bad, and the ugly. College Teaching, 53(1), 27-31. doi: 10.3200/CTCH.53.1.27-31
Hambleton, R. K., & Swaminathan, H. (1985). Item response theory. Principles and Applications. Dordrecht, The Netherlands: Kluwer-Nijhoff Publishing Co.
Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory (Vol. 2). California, CA: Sage.
Hunter, D. M., Jones, R. M., & Randhawa, B. S. (1996). The use of holistic versus analytic scoring for large-scale assessment of writing. The Canadian Journal of Program Evaluation, 11(2), 61-85. Retrieved from https://www.evaluationcanada.ca/secure/11-2-061.pdf
Jonsson, A., & Svingby, G. (2007). The use of scoring rubrics: Reliability, validity and educational consequences. Educational Research Review, 2(2), 130-144. doi: 10.1016/j.edurev.2007.05.002
Karadağ, E. (2011). Instruments used in doctoral dissertations in educational sciences in Turkey: Quality of research and analytical errors. Educational Sciences: Theory & Practice, 11(1), 311-334. Retrieved from https://silo.tips/download/eitim-bilimleri-doktora-tezlerinde-kullanlan-lme-aralar-nitelik-dzeyleri-ve-anal
Lane, S., Raymond, M. R., & Haladyna, T. M. (2016). Handbook of test development (2. ed.). New York, NY: Routledge.
Lord, F.M., & Novick, M.R. (1968). Statistical theories of mental test scores. Menlo Park, CA: Addison-Wesley.
Mertler, C.A. (2000). Designing scoring rubrics for your classroom. Practical Assessment, Research, and Evaluation, 7(25), 1-8. doi: 10.7275/gcy8-0w24
Messick, S. (1995). Validity of psychological assessment: Validation of inferences from persons’ responses and performances as scientific inquiry into score meaning. American Psychologist, 50, 741-749. doi:10.1037/0003-066x.50.9.741
Miles, M. B., & Huberman, A. M. (1994). Qualitative data analysis: An expanded sourcebook (2. ed.). Thousand Oaks, CA: Sage.
Mor-Dirlik, E. (2014). Ölçek geliştirme konulu doktora tezlerinin test ve ölçek geliştirme standartlarına uygunluğunun incelenmesi. Eğitimde ve Psikolojide Ölçme ve Değerlendirme Dergisi, 5(2), 62-78. doi: 10.21031/epod.63138
Mor Dirlik, E. (2021). Farklı test kuramlarından hesaplanan madde ayırt edicilik parametrelerinin karşılaştırılması. Trakya Eğitim Dergisi. 11(2), 732-744. doi: 10.24315/tred.700445
Moskal, B. M. (2000). Scoring rubrics: What, when and how?. Practical Assessment, Research, and Evaluation, 7(3), 1-5. Doi: 10.7275/a5vq-7q66
Moskal, B. M., & Leydens, J. A. (2000). Scoring rubric development: validity and reliability. Practical Assessment, Research, and Evaluation, 7(4), 1-22. doi: 10.7275/q7rm-gg74
Mutluer, C., & Yandı, A. (2012, September). Türkiye’deki üniversitelerde 2010-2012 yılları arasında yayımlanan tezlerdeki başarı testlerin incelenmesi. Paper presented at the Eğitimde ve Psikolojide Ölçme ve Değerlendirme III. Ulusal Kongresi, Turkey: Bolu. Abstract retrieved from https://www.epodder.org/wp-content/uploads/2020/07/EPOD-2012.pdf
Olgun, G., & Alatlı, B. (2021). The review of scale development and adaptation studies published for adolescents in Turkey. The Journal of Turkish Educational Sciences, 19(1), 568-592. doi: 10.37217/tebd.849954
Öksüzoğlu, M. (2022). The investigation of items measuring high-level thinking skills in terms of student score and score reliability. (Unpublished master thesis). Hacettepe University, Ankara.
Özçelik, D. A. (1992). Ölçme ve değerlendirme [Measurement and assessment]. Ankara: ÖSYM Publ.
Reznitskaya, A., Kuo, L., Glina, M., & Anderson, R. C. (2009). Measuring argumentative reasoning: What’s behind the numbers?. Learning and Individual Differences, 19(2), 219–224. doi:10.1016/j.lindif.2008.11.001.
Şanlı, E. (2010). Comparing reliability levels of scoring of the holistic and analytic rubrics in evaluating the scientific process skills. (Unpublished master thesis). Ankara University, Ankara.
Şahin, M. G. (2019). Performansa dayalı değerlendirme. B. Çetin (Ed.). In Eğitimde ölçme ve değerlendirme [Measurement and assessment in education] (p. 213-264). Ankara: Anı Publ.
Şahin, M. G., & Boztunç-Öztürk, N. (2018). Scale development process in educational field: A content analysis research. Kastamonu Education Journal, 26(1), 191-199. doi: 10.24106/kefdergi.375863
Tindal, G., & Haladyna, T. M. (2012). Large-scale assessment programs for all students: Validity, technical adequacy, and implementation. Mahwah, New Jersey: Lawrence Erlbaum.
Turgut, F. (1992). Eğitimde ölçme ve değerlendirme [Measurement and assessment in education] (8. ed.). Ankara: Saydam Publ.
Yıldırım, A., & Şimşek, H. (2013). Sosyal Bilimlerde Nitel Araştırma Yöntemleri [Qulatitative Research Methods in Social Sciences] (9. ed.). Ankara: Seçkin Publ.
Yıldıztekin, B. (2014). The comparison of interrater reliability by using estimating tecniques in classical test theory and generalizability theory. (Unpublished master thesis). Hacettepe University, Ankara.

Toplam 50 adet kaynakça vardır.

Ayrıntılar

Birincil Dil	İngilizce
Konular	Alan Eğitimleri
Bölüm	Research Articles
Yazarlar	Melek Gülşah Şahin 0000-0001-5139-9777 Yıldız Yıldırım 0000-0001-8434-5062 Nagihan Boztunc Öztürk 0000-0002-2777-5311
Yayımlanma Tarihi	30 Ocak 2023
Kabul Tarihi	18 Aralık 2022
Yayımlandığı Sayı	Yıl 2023 Cilt: 10 Sayı: 1

Kaynak Göster

APA	Şahin, M. G., Yıldırım, Y., & Boztunc Öztürk, N. (2023). Examining the Achievement Test Development Process in the Educational Studies. Participatory Educational Research, 10(1), 251-274. https://doi.org/10.17275/per.23.14.10.1