Designing AI-Aware Assessment Models to Measure Students’ Genuine English Proficiency

Doni Hadi Irawan; Muhamad Alfi Khoiruman Muhamad Alfi Khoiruman; Dewi Untari Dewi Untari

Authors

Doni Hadi Irawan Akademi Kelautan Banyuwangi Author
Muhamad Alfi Khoiruman Akademi Kelautan Banyuwangi Author
Dewi Untari Universitas dr. Soebandi Author

Keywords:

Artificial intelligence; AI-aware assessment; English proficiency.

Abstract

The rapid advancement of artificial intelligence (AI) has transformed language assessment practices, offering increased efficiency and consistency in scoring. However, concerns remain regarding the validity of AI-based assessment in measuring students’ genuine English proficiency, particularly in productive language skills. This study aims to design and evaluate an AI-aware assessment model that aligns technological innovation with communicative competence frameworks. Employing a design-based research approach, the study involved 120 secondary-level EFL students and six English teachers in an authentic classroom context. The assessment model comprised four performance-based tasks—two writing and two speaking—evaluated using shared multidimensional rubrics applied by both AI-assisted scoring and human raters. Quantitative data were analyzed through descriptive statistics and correlation analysis, while qualitative data were examined thematically. The findings indicate that AI-assisted scoring demonstrates moderate to high consistency with human ratings in linguistic accuracy, lexical range, and coherence. However, discrepancies were observed in assessing pragmatic and communicative effectiveness, underscoring the limitations of fully automated evaluation. The study concludes that AI-aware assessment models are most effective when implemented within a human–AI collaborative framework. Such an approach enhances assessment efficiency and diagnostic feedback while preserving construct validity and ethical accountability in measuring genuine English proficiency.

Downloads

Download data is not yet available.

References

Baker, R., & Hattie, J. (2022). Educational assessment in the age of artifi-cial intelligence. Routledge.

https://doi.org/10.4324/9781003142147

Chapelle, C. A. (2020). Argument-based validation in language testing. Cambridge University Press.

https://doi.org/10.1017/9781108581797

Chen, L., Li, X., & Wang, Y. (2024). Artificial intelligence in language as-sessment: Advances, challenges, and future directions. Language Testing, 41(1), 5–26.

https://doi.org/10.1177/02655322231204567

Council of Europe. (2018). Common European framework of reference for languages: Learning, teaching, assessment (Companion volume). Council of Europe Publishing.

Dai, Y. (2025). Construct representation in automated writing evaluation systems. Assessing Writing, 53, 100635.

https://doi.org/10.1016/j.asw.2024.100635

Fulcher, G. (2019). Practical language testing. Routledge.

https://doi.org/10.4324/9780203768467

Guo, W., Crossley, S. A., & McNamara, D. S. (2025). Bias and fairness in automated language assessment. Journal of Educational Measurement, 62(1), 1–21.

https://doi.org/10.1111/jedm.12345

Knoch, U., & Chapelle, C. A. (2022). Automated scoring in language as-sessment: Opportunities and limitations. Language Testing, 39(3), 355–378.

https://doi.org/10.1177/02655322211072685

Loukina, A., Zechner, K., & Beigman Klebanov, B. (2021). Fairness in au-tomated scoring: Challenges and solutions. Educational Measurement: Issues and Practice, 40(2), 48–60.

https://doi.org/10.1111/emip.12389

O’Sullivan, B. (2021). Language assessment: Theories and practices. Pal-grave Macmillan.

https://doi.org/10.1007/978-3-030-68274-2

Ouyang, F., & Jiao, P. (2023). Artificial intelligence in education: The emerging ethical challenges. Educational Technology Research and Develop-ment, 71, 1–24.

https://doi.org/10.1007/s11423-022-10198-1

Pamungkas, A. (2025). Teachers’ perceptions of AI-assisted assessment in EFL classrooms. Indonesian Journal of Applied Linguistics, 15(1), 87–101.

Sabri, M., & Wais, A. (2025). AI-aware assessment models in EFL contexts: A diagnostic approach. Journal of Language and Education, 11(2), 34–49.

Shute, V. J., & Rahimi, S. (2021). Review of formative feedback in comput-er-based learning environments. Computers & Education, 167, 104189.

https://doi.org/10.1016/j.compedu.2021.104189

Tsai, Y. S., Whitelock-Wainwright, A., & Gašević, D. (2021). The ethics of learning analytics and AI. British Journal of Educational Technology, 52(4), 1429–1443.

https://doi.org/10.1111/bjet.13133

Weir, C. J., & Galaczi, E. D. (2018). Exploring language assessment and validation. Palgrave Macmillan.

https://doi.org/10.1057/978-1-137-52738-2

Williamson, D. M., & Bejar, I. I. (2020). Automated scoring and validity considerations. Educational Measurement: Issues and Practice, 39(3), 4–14.

https://doi.org/10.1111/emip.12326

Wind, S. A., Wolfe, E. W., & Foltz, P. W. (2023). Validity challenges in AI-based writing assessment. Journal of Writing Research, 14(3), 567–592.

6 Xi, X. (2025). AI and construct validity in language assessment. Language As-sessment Quarterly, 22(1), 1–18.

https://doi.org/10.1080/15434303.2024.2398123

Xi, X., & Sawaki, Y. (2022). Automated scoring in speaking assessment. Language Testing, 39(2), 243–267.

https://doi.org/10.1177/02655322211062532

Zhang, Y., Dai, H., & Ardasheva, Y. (2023). Adaptive testing and AI in EFL assessment. System, 115, 103047.

https://doi.org/10.1016/j.system.2023.103047

Bachman, L. F., & Palmer, A. S. (2010). Language assessment in practice. Oxford University Press.

Crossley, S. A. (2020). Linguistic features in automated writing evaluation. Journal of Second Language Writing, 49, 100740.

https://doi.org/10.1016/j.jslw.2020.100740

Deane, P. (2013). On the relation between automated essay scoring and writing competence. Assessing Writing, 18(1), 7–26.

Mislevy, R. J. (2018). Sociocognitive foundations of educational measure-ment. Routledge.

Norris, J. M. (2016). Current uses for task-based language assessment. Annual Review of Applied Linguistics, 36, 230–244.

Plakans, L., & Gebril, A. (2013). Using automated scoring in second lan-guage writing assessment. Language Learning, 63(2), 393–429.

Reeves, T. C. (2006). Design research from a technology perspective. In J. van den Akker et al. (Eds.), Educational design research (pp. 52–66). Routledge.

Wiliam, D. (2018). Embedded formative assessment. Solution Tree Press.

Designing AI-Aware Assessment Models to Measure Students’ Genuine English Proficiency

Authors

Keywords:

Abstract

Downloads

References

Downloads

Published

Issue

Section

License

How to Cite

Similar Articles

sidebarrye

Make a Submission

Information

Statistics

Template Journal

Contact Us