Examining the Differential Rater Functioning in the Process of Assessing Writing Skills of Middle School 7th Grade Students

Aslıhan Erman Aslanoğlu; Mehmet Şata

doi:10.17275/per.21.88.8.4

Research Article

Year 2021, Volume: 8 Issue: 4, 239 - 252, 01.12.2021

Aslıhan Erman Aslanoğlu Mehmet Şata

https://doi.org/10.17275/per.21.88.8.4

Cited By: 3

Abstract

References

Aslanoğlu, A. E. ve Kutlu, Ö. (2003). Öğretimde sunu becerilerinin değerlendirilmesinde dereceli puanlama anahtarı (rubric) kullanılmasına ilişkin bir araştırma [Research on rubric in evaluating the presentation skills in education]. Ankara Üniversitesi Eğitim Bilimleri Fakültesi Dergisi [Ankara University Faculty of Educational Sciences Journal], 36(1-2), 25-36.
Bond, T. G., & Fox, C. M. (2015). Applying the rasch model: Fundamental measurement in the human sciences (3rd ed.). New York: Routledge.
Collins, J. L. (2000). Review of key concepts in strategic reading and writing instruction. J. L. Collins (Ed.), in Cheektowaga-sloan handbook of practical reading and writing strategies (pp. 5-10). Retrieved from http://gse.buffalo.edu/org/writingstrategies/PDFFiles/CHEEKTOWAGA-SLOAN.PDF
Du, Y.,Wright, B. D., & Brown, W. L. (1996, April). Differential facet functioning detection in direct writing assessment. Paper presented at the Annual Meeting of the American Educational Research Association, New York.
Eckes, T. (2005). Examining rater effects in test of writing and speaking performance assessments: A many-facet rasch analysis. Language Assessment Quarterly, 2(3), 197-221. https://doi.org/10.1207/s15434311laq0203_2
Eckes, T. (2008). Rater types in writing performance assessments: A classification approach to rater variability. Language Testing, 25(2), 155-185. https://doi.org/10.1177/0265532207086780
Eckes, T. (2019). Many-Facet Rasch measurement: Implications for rater-mediated language assesment. In Quantitative Data Analysis for Language Assessment (1st ed.) (pp.153-175). UK: Routledge.
Englert, C. S., & Mariage, T. (2003). The sociocultural model in special education interventions: Apprenticing students in higher-order thinking. In L. H. Swanson, K. Harris, & S. Graham (Eds.), Handbook of Learning Disabilities (pp. 450-467). New York: Guilford.
Erhardt, R. P., & Meade, V. (2005). Improving handwriting without teaching handwriting: The consultative clinical reasoning process. Australian Occupational Therapy Journal, 52(3), 199-210. https://doi.org/10.1111/j.1440-1630.2005.00505.x Engelhard, G. Jr. (1994). Examining rater errors in the assessment of written composition with a many-faceted Rasch model. Journal of Educational Measurement, 31(2), 93- 112.
Engelhard, G., &Myford, C. M. (2003). Monitoring faculty consultant performance in the advanced placement English Literature and composition program with a many‐faceted Rasch model. ETS Research Report Series (1), i-60.
Engelhard, G., Jr. (2007). Differential rater functioning. Rasch Measurement Transactions, 21, 1124-1125.
Englert, C. S., Raphael, T. E., Anderson Helene M., Anthony, L. M., & Stevens, D. D. (1991). Making strategies and self-talk visible: Writing instruction in regular and special education classrooms. American Educational Research Journal, 28(2), 337–372. https://doi.org/10.3102/00028312028002337
Farrokhi, F., Esfandiari, R., & Vaez Dalili, M. (2011). Applying the many-facet Rasch model to detect centrality in self-assessment, peer-assessment and teacher assessment. World Applied Sciences Journal, 15, 70-77.
Feldman, M., Lazzara, E. H., Vanderbilt, A. A., & DiazGranados, D. (2012). Rater training to support high‐stakes simulation‐based assessments. Journal of Continuing Education in the Health Professions, 32(4), 279-286. https://doi.org/10.1002/chp.21156
Goodrich, H. (1997). Understanding rubrics. Educational Leadership, 54(4), 14-17.
Goodwin, L. D. (2001). Interrater agreement and reliability. Measurement in Physical Education and Exercise Science, 5(1), 13-34.
Gyagenda, I. S., & Engelhard, G. (2009). Using classical and modern measurement theories to explore rater, domain, and gender influences on student writing ability. Journal of Applied Measurement, 10(3), 225-246.
Haiyang, S. (2010). An application of classical test theory and many facet Rasch measurement in analyzing the reliability of an English test for non-English majör graduates. Chinese Journal of Applied Linguistics, 33(2), 87-102.
Hauenstein, N. M., & McCusker, M. E. (2017). Rater training: Understanding effects of training content, practice ratings, and feedback. International Journal of Selection and Assessment, 25(3), 253-266. https://doi.org/10.1111/ijsa.12177
Hoyt, W. T. (2000). Rater bias in psychological research: When is it a problem and what can we do about it? Psychological Methods, 5(1), 64. http://dx.doi.org/10.1037/1082- 989X.5.1.64
Johnson, J. S., & Lim, G. S. (2009). The influence of rater language background on writing performance assessment. Language Testing, 26(4), 485-505. https://doi.org/10.1177/0265532209340186
Kondo-Brown, K. (2002). A FACETS analysis of rater bias in measuring Japanese second language writing performance. Language Testing, 19(1), 3-31. https://doi.org/10.1191/0265532202lt218oa
Kuan-Yu Jin & Wen-Chung Wang (2017). Assessment of Differential Rater Functioning in Latent Classes with New Mixture Facets Models. Multivariate Behavioral Research, 52(3), 391-402. https://doi.org/10.1080/00273171.2017.1299615
Lam, S. S. T., Au, R. K. C., Leung, H. W. H. & Li Tsang, C. W. P. (2011). Chinese handwriting performance of primary school children with dyslexia. Research in Developmental Disabilities, 32, 1745-1756. https://doi.org/10.1016/j.ridd.2011.03.001
Li Tsang, C. W. P., Au, R. K. C., Chan, M. H. Y., Chan, L. W. L., Lau, G. M. T., Lo, T. K. & Leung, H. W. H. (2011). Handwriting characteristics among secondary students with and without physical disabilities: A study with a computerized tool. Research in Developmental Disabilities, 32, 207-216. https://doi.org/10.1016/j.ridd.2010.09.015
Linacre, J.M. (2018). A user's guide to FACETS Rasch-model computer programs. Program manual 3.81. 0. Chicago: MESA Press.
Marzano, R. J. (2001). Designing a new taxonomy of educational objectives. Experts in assesment. Thousand Oaks, CA: Corwin Press, Inc.
McDonald, M. B. (1999). Seed Deterioration: Physiology, Repair and Assessment. Seed Science and Technology, 27(1), 177-237. Retrieved from https://ci.nii.ac.jp/naid/10025267238/
McNamara, T. (1996). Measuring second language performance. New York: Longman.
Osburn, H.G. (2000). Coefficient alpha and related internal consistency reliability coefficients. Psychological Methods, 5(3), 343-355. https://doi.org/10.1037/1082-989X.5.3.343
Sata, M. (2019). Performans degerlendirme surecinde puanlayici egitiminin puanlayici davranislari uzerindeki etkisinin incelenmesi [The investigation of the effect of rater training on the rater behaviors in the performance assessment process]. Unpublished doctoral dissertation. Gazi University, Ankara.
Schaefer, E. (2008). Rater bias patterns in an EFL writing assessment. Language Testing, 25(4), 465-493. https://doi.org/10.1177/0265532208094273
Shi, L. (2001). Native- and nonnative-speaking EFL teachers' evaluation of Chinese students' English writing. Language Testing, 18, 303-325. https://doi.org/10.1191/026553201680188988
Tamanini, K. B. (2008). Evaluating differential rater functioning in performance ratings: Using a goal-based approach. Unpublished doctoral dissertation. Ohio University, Ohio.
Wesolowski, B. C., Wind, S. A., & Engelhard, G. (2015). Rater fairness in music performance assessment: Evaluating model-data fit and differential rater functioning. Musicae Scientiae, 19(2), 147 -170. https://doi.org/10.1177/1029864915589014
Wolfe, E. W., & McVay, A. (2012). Application of Latent Trait Models to Identifying Substantively Interesting Raters. Educational Measurement: Issues and Practice, 31(3), 31-37. https://doi.org/10.1111/j.1745-3992.2012.00241.x

Examining the Differential Rater Functioning in the Process of Assessing Writing Skills of Middle School 7th Grade Students

Year 2021, Volume: 8 Issue: 4, 239 - 252, 01.12.2021

Aslıhan Erman Aslanoğlu Mehmet Şata

https://doi.org/10.17275/per.21.88.8.4

Cited By: 3

Abstract

When students present writing tasks that require higher order thinking skills to work, one of the most important problems is scoring these writing tasks objectively. The fact that raters give scores below or above their performance based on several environmental factors affects the consistency of the measurements. Inconsistencies in scoring negatively affect the validity and reliability of student performance and cause the scores obtained to be questioned. In regard to the validity and reliability of these measurements, it is significant to identify the rater behavior and correct the sources of error. This study aims to analyze the differential rater functioning (DRF), which is one of the problematic rater behaviors, in evaluating compositions written by middle school 7th-grade students within the scope of the Turkish course. 86 students attending a public school were participated the study. Students' compositions were rated using an analytical rubric by 8 teachers from different institutions. In this correlational research, the many facet Rasch model was used, and five variables including students, raters’ and, students’ gender, students’ qualification, and evaluation criteria were examined. it was examined whether the raters show DRF on an individual and group basis based on the dual interaction analysis, including the gender of the student x rater and the student's competence x rater. The findings have revealed that DRF at the group level does not interfere with the measurements, while the individual level DRF is involved in the measurements. It was determined that the level of DRF mixing in the measurements of successful students was the lowest. Especially rigid and lenient raters were found to show DRF. In the present study, it was observed that the raters showing DRF was also the most lenient raters, while these raters did not show DRF in terms of the gender of the student.

Keywords

Writing assessment, differential rater functioning, many facet rasch measurement

References

Aslanoğlu, A. E. ve Kutlu, Ö. (2003). Öğretimde sunu becerilerinin değerlendirilmesinde dereceli puanlama anahtarı (rubric) kullanılmasına ilişkin bir araştırma [Research on rubric in evaluating the presentation skills in education]. Ankara Üniversitesi Eğitim Bilimleri Fakültesi Dergisi [Ankara University Faculty of Educational Sciences Journal], 36(1-2), 25-36.
Bond, T. G., & Fox, C. M. (2015). Applying the rasch model: Fundamental measurement in the human sciences (3rd ed.). New York: Routledge.
Collins, J. L. (2000). Review of key concepts in strategic reading and writing instruction. J. L. Collins (Ed.), in Cheektowaga-sloan handbook of practical reading and writing strategies (pp. 5-10). Retrieved from http://gse.buffalo.edu/org/writingstrategies/PDFFiles/CHEEKTOWAGA-SLOAN.PDF
Du, Y.,Wright, B. D., & Brown, W. L. (1996, April). Differential facet functioning detection in direct writing assessment. Paper presented at the Annual Meeting of the American Educational Research Association, New York.
Eckes, T. (2005). Examining rater effects in test of writing and speaking performance assessments: A many-facet rasch analysis. Language Assessment Quarterly, 2(3), 197-221. https://doi.org/10.1207/s15434311laq0203_2
Eckes, T. (2008). Rater types in writing performance assessments: A classification approach to rater variability. Language Testing, 25(2), 155-185. https://doi.org/10.1177/0265532207086780
Eckes, T. (2019). Many-Facet Rasch measurement: Implications for rater-mediated language assesment. In Quantitative Data Analysis for Language Assessment (1st ed.) (pp.153-175). UK: Routledge.
Englert, C. S., & Mariage, T. (2003). The sociocultural model in special education interventions: Apprenticing students in higher-order thinking. In L. H. Swanson, K. Harris, & S. Graham (Eds.), Handbook of Learning Disabilities (pp. 450-467). New York: Guilford.
Erhardt, R. P., & Meade, V. (2005). Improving handwriting without teaching handwriting: The consultative clinical reasoning process. Australian Occupational Therapy Journal, 52(3), 199-210. https://doi.org/10.1111/j.1440-1630.2005.00505.x Engelhard, G. Jr. (1994). Examining rater errors in the assessment of written composition with a many-faceted Rasch model. Journal of Educational Measurement, 31(2), 93- 112.
Engelhard, G., &Myford, C. M. (2003). Monitoring faculty consultant performance in the advanced placement English Literature and composition program with a many‐faceted Rasch model. ETS Research Report Series (1), i-60.
Engelhard, G., Jr. (2007). Differential rater functioning. Rasch Measurement Transactions, 21, 1124-1125.
Englert, C. S., Raphael, T. E., Anderson Helene M., Anthony, L. M., & Stevens, D. D. (1991). Making strategies and self-talk visible: Writing instruction in regular and special education classrooms. American Educational Research Journal, 28(2), 337–372. https://doi.org/10.3102/00028312028002337
Farrokhi, F., Esfandiari, R., & Vaez Dalili, M. (2011). Applying the many-facet Rasch model to detect centrality in self-assessment, peer-assessment and teacher assessment. World Applied Sciences Journal, 15, 70-77.
Feldman, M., Lazzara, E. H., Vanderbilt, A. A., & DiazGranados, D. (2012). Rater training to support high‐stakes simulation‐based assessments. Journal of Continuing Education in the Health Professions, 32(4), 279-286. https://doi.org/10.1002/chp.21156
Goodrich, H. (1997). Understanding rubrics. Educational Leadership, 54(4), 14-17.
Goodwin, L. D. (2001). Interrater agreement and reliability. Measurement in Physical Education and Exercise Science, 5(1), 13-34.
Gyagenda, I. S., & Engelhard, G. (2009). Using classical and modern measurement theories to explore rater, domain, and gender influences on student writing ability. Journal of Applied Measurement, 10(3), 225-246.
Haiyang, S. (2010). An application of classical test theory and many facet Rasch measurement in analyzing the reliability of an English test for non-English majör graduates. Chinese Journal of Applied Linguistics, 33(2), 87-102.
Hauenstein, N. M., & McCusker, M. E. (2017). Rater training: Understanding effects of training content, practice ratings, and feedback. International Journal of Selection and Assessment, 25(3), 253-266. https://doi.org/10.1111/ijsa.12177
Hoyt, W. T. (2000). Rater bias in psychological research: When is it a problem and what can we do about it? Psychological Methods, 5(1), 64. http://dx.doi.org/10.1037/1082- 989X.5.1.64
Johnson, J. S., & Lim, G. S. (2009). The influence of rater language background on writing performance assessment. Language Testing, 26(4), 485-505. https://doi.org/10.1177/0265532209340186
Kondo-Brown, K. (2002). A FACETS analysis of rater bias in measuring Japanese second language writing performance. Language Testing, 19(1), 3-31. https://doi.org/10.1191/0265532202lt218oa
Kuan-Yu Jin & Wen-Chung Wang (2017). Assessment of Differential Rater Functioning in Latent Classes with New Mixture Facets Models. Multivariate Behavioral Research, 52(3), 391-402. https://doi.org/10.1080/00273171.2017.1299615
Lam, S. S. T., Au, R. K. C., Leung, H. W. H. & Li Tsang, C. W. P. (2011). Chinese handwriting performance of primary school children with dyslexia. Research in Developmental Disabilities, 32, 1745-1756. https://doi.org/10.1016/j.ridd.2011.03.001
Li Tsang, C. W. P., Au, R. K. C., Chan, M. H. Y., Chan, L. W. L., Lau, G. M. T., Lo, T. K. & Leung, H. W. H. (2011). Handwriting characteristics among secondary students with and without physical disabilities: A study with a computerized tool. Research in Developmental Disabilities, 32, 207-216. https://doi.org/10.1016/j.ridd.2010.09.015
Linacre, J.M. (2018). A user's guide to FACETS Rasch-model computer programs. Program manual 3.81. 0. Chicago: MESA Press.
Marzano, R. J. (2001). Designing a new taxonomy of educational objectives. Experts in assesment. Thousand Oaks, CA: Corwin Press, Inc.
McDonald, M. B. (1999). Seed Deterioration: Physiology, Repair and Assessment. Seed Science and Technology, 27(1), 177-237. Retrieved from https://ci.nii.ac.jp/naid/10025267238/
McNamara, T. (1996). Measuring second language performance. New York: Longman.
Osburn, H.G. (2000). Coefficient alpha and related internal consistency reliability coefficients. Psychological Methods, 5(3), 343-355. https://doi.org/10.1037/1082-989X.5.3.343
Sata, M. (2019). Performans degerlendirme surecinde puanlayici egitiminin puanlayici davranislari uzerindeki etkisinin incelenmesi [The investigation of the effect of rater training on the rater behaviors in the performance assessment process]. Unpublished doctoral dissertation. Gazi University, Ankara.
Schaefer, E. (2008). Rater bias patterns in an EFL writing assessment. Language Testing, 25(4), 465-493. https://doi.org/10.1177/0265532208094273
Shi, L. (2001). Native- and nonnative-speaking EFL teachers' evaluation of Chinese students' English writing. Language Testing, 18, 303-325. https://doi.org/10.1191/026553201680188988
Tamanini, K. B. (2008). Evaluating differential rater functioning in performance ratings: Using a goal-based approach. Unpublished doctoral dissertation. Ohio University, Ohio.
Wesolowski, B. C., Wind, S. A., & Engelhard, G. (2015). Rater fairness in music performance assessment: Evaluating model-data fit and differential rater functioning. Musicae Scientiae, 19(2), 147 -170. https://doi.org/10.1177/1029864915589014
Wolfe, E. W., & McVay, A. (2012). Application of Latent Trait Models to Identifying Substantively Interesting Raters. Educational Measurement: Issues and Practice, 31(3), 31-37. https://doi.org/10.1111/j.1745-3992.2012.00241.x

There are 36 citations in total.

Details

Primary Language	English
Subjects	Studies on Education
Journal Section	Research Articles
Authors	Aslıhan Erman Aslanoğlu 0000-0002-1364-7386 Mehmet Şata 0000-0003-2683-4997
Publication Date	December 1, 2021
Acceptance Date	April 17, 2021
Published in Issue	Year 2021 Volume: 8 Issue: 4

Cite

APA	Erman Aslanoğlu, A., & Şata, M. (2021). Examining the Differential Rater Functioning in the Process of Assessing Writing Skills of Middle School 7th Grade Students. Participatory Educational Research, 8(4), 239-252. https://doi.org/10.17275/per.21.88.8.4