Evaluating Various Large Language Models’ Scoring Characteristics of Open-Ended Responses and Essays
Keywords:
: grading reliability, large language models (LLMs), prompt engineeringAbstract
This study evaluates the grading effectiveness of large language models (LLMs), including ChatGPT-5, Claude, and DeepSeek, on open-ended responses and essays. In Phase 1, AI scores were compared with scores of human instructors, revealing differences in leniency, depth, and alignment, quantified using a normalized distance metric. In Phase 2, prompt engineering and few-shot learning improved alignment with human graders. Principal Component Analysis plots with Kernel Density Estimation contours supported these gains. A single middle-performing exemplar strategy consistently improved grading alignment across models and assignments without negative effects, while other strategies showed variable results. Careful design remains essential to ensure fairness and responsible AI-assisted assessment.
Downloads
References
Aji, C. A., & Khan, M. J. (2019). The impact of active learning on students’ academic performance. Open Journal of Sciences, 7(3), 204-211. https://doi.org/10.4236/jss.2019.73017.
Alabidi, S., Alarabi, K., Alsalhi, N. R., & Mansoori, M. A. (2023). The dawn of ChatGPT: Transformation in science assessment. Eurasian Journal of Educational Research, 106, 321-337. https://doi.org/10.14689/ejer.2023.106.019
Chiu, T. K. F., Xia, Q., Zhou, X., Chai, C. S., Cheng, M. (2023). Systematic literature review on
opportunities, challenges, and future research recommendations of artificial intelligence in education. Computers and Education: Artificial Intelligence, 4, 100118.
https://doi.org/10.1016/j.caeai.2022.100118
Damasevicious, R. & Šidlauskiene, T. (2024). AI as a teacher: A new educational dynamic for modern classrooms for personalized learning support. AI-enhanced teaching methods (pp. 1-24). https://doi.org/10.4018/979-8-3693-2728-9.ch001
Geewax, J. J. (2021). API design patterns. Manning.
Gilbert, S., Harvey, H., Melvin, T., Vollebregt, E., & Wicks, P. (2023). Large language models AI chatbots require approval as medical devices. Nature Medicine, 29, 2396-2398.
https://doi.org/10.1038/s41591-023-02412-6.
Giray, L. (2023). Prompt engineering with ChatGPT: A guide for academic writers. Annals of
Biomedical Engineering, 51(12), 2629-2633. https://doi.org/10.1007/s10439-023-03272-4.
Jukiewicz, M. (2024). The future of grading programming assignments in education: The role of ChatGPT in automating the assessment and feedback process. Thinking Skills and Creativity, 52, e101522. https://doi.org/10.1016/j.tsc.2024.101522
World Medical Association. (2008). Declaration of Helsinki: Ethical principles for medical research involving human subjects. Journal of the American Medical Association, 300(20), 2413–2415. https://doi.org/10.1001/jama.2008.346
Klyshbekova, M., & Abbott, P. (2024). Chatbot and assessment in higher education: A magic wand or a disrupter?” Electronic Journal of E-Learning, 22(2), 30-45.
https://doi.org/10.34190/ejel.21.5.3114
Kooli, C., & Yusuf, N. (2024). Transforming educational assessment: Insights into the use of ChatGPT and large language models in grading. International Journal of Human-Computer Interaction, 1-12. https://doi.org/10.1080/10447318.2024.2338330
Lo, C. K. (2023). What is the impact of ChatGPT on education? A rapid review of the literature. Education Science, 13(4), 410.
Mcgovern, M. (2024). Using the generative artificial intelligence (AI) chatbot of perplexity and CHATGPT as a teaching and learning tool for practice teachers and students within social work placement. The Journal of Teaching and Learning, 22(1-2), 1-19. https://doi.org/10.1921/jpts.v21i3.2223
Shin, B., Lee, J., & Yoo, Y. (2024). Exploring automatic scoring of mathematical descriptive assessment using prompt engineering with the GPT-4 model: Focused on permutations and combinations. The Mathematical Education, 63(2), 187-207. https://doi.org/10.7468/mathedu.2024.63.2.187
Wang, S., Wang, F., Zhu, Z., Wang, J., Tran, T., Du, Z. (2024). Artificial intelligence in education: A systematic literature review. Expert Systems with Applications, 254, 214167. https://doi.org/10.1016/j.eswa.2024.214167
Wetzler, E. L., Cassidy, K. S., Jones, M. J., Frazier, C. R., Korbut, N. A., Sims, C. M., Bowen, S. S., & Wood, M. (2024). Grading the grader: Comparing generative AI and human assessment in essay evaluation. Teaching of Psychology, 0(0), 1-7. https://doi.org/10.1177/00986283241282696
Yang, X., Wang, Q., & Lyu, J. (2023). Assessing ChatGPT’s educational capabilities and application potential, ECNU Review of Education. https://doi.org/10.1177/20965311231210006
Downloads
Published
Data Availability Statement
The datasets generated and analyzed during the current study are not publicly available due to student privacy and institutional restrictions but are available from the corresponding author on reasonable request.
License
Access Agreement Journal on Excellence in College Teaching
Before proceeding you must agree to the terms and conditions of usage as outlined below by clicking on the Accept button and/or by both parties’ signatures below. You will have to do this only once. After agreement, you will be redirected back to the main Journal page. A pdf copy of the terms is available for download.
This Access Agreement (the "Agreement") is effective upon processing of payment ("Effective Date") and is entered into by and between the Journal on Excellence in College Teaching (“JECT”) and the Customer (“Customer").
This Agreement constitutes the entire agreement and supersedes and voids all prior communications, understandings, and agreements relating to the Product(s), including any terms of use displayed to Authorized Users via the online site of the Product(s). Alterations to the Agreement and to any Addendum to the Agreement are only valid and binding if they are recorded in writing and signed by both parties
. I. Definitions
"Authorized Users" shall mean individuals who are authorized by the Customer (which shall include those individuals authorized by the Institutions hereunder) to access the Customer's information services whether on-site or off-site via secure authentication and who are affiliated with the Customer as a student (undergraduates and postgraduates), employee (whether on a permanent or temporary basis), or a contractor of the Customer. Individuals who are not a current student, employee, or a contractor of the Customer, but who are permitted to access the Customer's information services from computer terminals within the physical premises of the Customer ("Walk-In Users"), are also deemed to be Authorized Users, but only for the time they are within the physical premises of the Customer. Walk-In Users may not be given means to access the Product(s) when they are not within the physical premises of the Customer.
"Commercial Use" shall mean use for the purpose of monetary reward (whether by or for the Customer or an Authorized User) by means of the sale, resale, loan, transfer, hire, or other form of exploitation of the Product(s). For the avoidance of doubt, neither recovery of direct cost by the Customer from Authorized Users, nor use by the Customer or Authorized Users of the Product(s) in the course of research funded by a commercial organization, shall be deemed to constitute Commercial Use.
"Educational Purposes" shall mean for the purpose of education, teaching, distance learning, private study and/or research as described in Section II below.
"Institutions" shall mean the Customer's participating institutions, if applicable.
"License" shall mean the non-exclusive, non-transferable right to access and use the Product(s) pursuant to the specific terms and conditions set forth in this Agreement.
"Product(s)" shall mean the products, materials and/or information contained therein that are subject to this Agreement. Product(s) include the Journal on Excellence in College Teaching and the archive of the Learning Communities Journal.
"Reasonable Amount" shall be determined based on guidelines set forth by 17 U.S. Code § 107 (Limitations on exclusive rights, Fair use).
"Secure Authentication" shall mean access to the Product(s) by Internet Protocol ("IP") ranges or by another means of authentication agreed between the Publisher and Customer or Institutions (if applicable) from time to time.
II. Authorized Use of Product(s)
Customer, the Institutions (if applicable), and Authorized Users may use the Product(s) for Educational Purposes as follows:
Analysis. Authorized Users shall be permitted to extract or use information contained in the Product(s) for Educational Purposes, including, but not limited to, text and data mining, extraction and manipulation of information for the purposes of illustration, explanation, example, comment, criticism, teaching, research, or analysis.
Course Packs. Customer, the Institutions, and Authorized Users may use a Reasonable Amount of the Product(s) in the preparation of course packs or other educational materials.
Digital Copy. Customer, the Institutions, and Authorized Users may download and digitally copy a Reasonable Amount of the Product(s).
Display. Customer, the Institutions, and Authorized Users shall have the right to electronically display the Product(s) to the extent necessary to further the intent and purpose of this Agreement.
Electronic Reserve. Customer, the Institutions, and Authorized Users may use a Reasonable Amount of each of the Product(s) in connection with specific courses of instruction offered by Customer.
Interlibrary Loan. The Customer and the Institutions shall be permitted to use Reasonable Amounts of the Content to fulfill occasional requests from other, non-participating institutions, a practice commonly called Interlibrary Loan ("ILL"). Customer and the Institutions shall fulfill such requests in compliance with Section 108 of the United States Copyright Law (17 USC S108, "Limitations on exclusive rights: Reproduction by libraries and archives") and the Guidelines for the Proviso of Subsection 108(2g)(2) prepared by the National Commission on New Technological Uses of Copyrighted Works (CONTU).
The electronic form of the Product(s) may be used as a source for ILL. Customer and the Institutions shall include copyright notices on all ILL transmissions. Notwithstanding anything herein to the contrary, in no event shall any non-secure electronic transmission of files be permitted.
Print Copy. Customer, the Institutions, and Authorized Users may print a Reasonable Amount of the Product(s).
Recover Copying Costs. Customer and the Institutions may charge a reasonable fee to cover costs of copying or printing portions of Product(s) for Authorized Users.
Scholarly Sharing. Authorized Users may transmit to a third party colleague in hard copy or electronically, Reasonable Amounts of the Product(s) for personal use, professional use, or Educational Purposes but in no event for Commercial Use. In addition, Authorized Users have the right to use, with appropriate credit, figures, tables, and brief excerpts from the Product(s) in the Authorized User's own scientific, scholarly, and educational works.
Text Mining. Authorized Users may use the licensed material to perform and engage in text mining/data mining activities for legitimate academic research and other Educational Purposes. Those uses beyond educational use shall require permission from the Publisher.
III. Restrictions
Except as provided herein, the institution shall make reasonable efforts to inform its authorized users not to use, alter, decompile, modify, display, or distribute the Product(s) as follows:
Alter Identification. Remove, obscure, or modify copyright notices, text acknowledging, attributions, or other means of identification or disclaimers as they appear. Alter Product(s).
Alter, decompile, adapt, or modify the Product(s), except to the extent necessary to make it perceptible on a computer screen, or as otherwise permitted in this Agreement. Alteration of words or their order is strictly prohibited.
Commercial Use. No Commercial Use of the Product(s) shall be permitted unless the Customer or an Authorized User has been granted prior written consent by an authorized representative of the Product(s). Use of all or any part of the Product(s) for any Commercial Use or for any purpose other than Educational Purposes.
Distribution. Display or distribute any part of the Product(s) on any electronic network, including without limitation, the Internet, and any other distribution medium now in existence or hereinafter created, other than by a Secure Authentication; print and distribute any portion(s) of the Product(s)s to persons or entities other than the Customer or Authorized Users, except as provided in Section II.
JECT acknowledges that the Customer cannot police or control the actions of its students, faculty, and other Authorized Users with respect to their use of the Product(s). In the event of abuse, the institution shall make prompt and reasonable efforts to heal the breach and notify the publisher.
IV. Term and Termination
This agreement shall commence on the Effective Date and shall remain in effect unless and until terminated as permitted herein (the "Term"). There is no perpetual electronic access to content made available during the term of the agreement.
JECT may terminate this Agreement if Customer violates any of the terms and conditions set forth herein. In the event of any termination of access, JECT will promptly notify the Customer of the basis for termination.
The Customer may terminate this Agreement if sufficient funds are not provided or allotted in future government-approved budgets of the Customer (or reasonably available or expected to become available from other sources at the time the Customer’s payment obligation attaches) to permit the Subscriber, in the exercise of its reasonable administrative discretion, to continue this Agreement.
In the event of any unauthorized use of the Product(s) by an Authorized User, Customer shall cooperate with JECT in the investigation of any unauthorized use of the Product(s) of which it is made aware and shall use reasonable efforts to remedy such unauthorized use and prevent its recurrence. JECT may terminate such Authorized User's access to the Product(s) after first providing reasonable notice to the Customer (in no event less than two (2) weeks) and cooperating with the Customer to avoid recurrence of any unauthorized use. In the event of any termination of access, JECT will promptly notify the Customer
. V. Refunds
In the event that a subscription is canceled by the Customer prior to the subscription end date, the following will be used as guidelines for refunds.
Electronic subscriptions. The Customer shall be entitled to a full refund within 14 days of the start of the most recent subscription term. Refunds requested after 14 days but no later than 60 days from the start of the most recent subscription term will be allowed, minus a 30% processing fee. Refunds will not be granted if requested more than 60 days after the start of the most recent subscription term.