Poon, YinWang, QiongLee, John S. Y.Lam, Yu YanKai Wah Chu, SamuelMuñoz Sánchez, RicardoAlfter, DavidVolodina, ElenaKallas, Jelena2025-02-172025-02-172025-03https://hdl.handle.net/10062/107171According to the internationally recognized PIRLS (Progress in International Reading Literacy Study) assessment standards, reading comprehension questions should encompass all four comprehension processes: retrieval, inferencing, integrating and evaluation. This paper investigates whether Large Language Models can produce high-quality questions for each of these categories. Human assessment on a Chinese dataset shows that GPT-4o can generate usable and category-specific questions, ranging from 74% to 90% accuracy depending on the category.enAttribution-NonCommercial-NoDerivatives 4.0 Internationalhttps://creativecommons.org/licenses/by/4.0/PIRLS Category-specific Question Generation for Reading ComprehensionArticle