Validating a forced‑choice method for eliciting quality‑of‑reasoning judgments

Journal article


Marcoci, Alexandru, Stelmach, Margaret E., Rowe, Luke, Barnett, Ashley, Primoratz, Tamar, Kruger, Ariel, Karvetski, Christopher W., Stone, Benjamin, Diamond, Michael L., Saletta, Morgan, van Gelder, Tim, Tetlock, Philip E. and Dennis, Simon. (2023). Validating a forced‑choice method for eliciting quality‑of‑reasoning judgments. Behavior Research Methods. pp. 1-16. https://doi.org/10.3758/s13428-023-02234-x
AuthorsMarcoci, Alexandru, Stelmach, Margaret E., Rowe, Luke, Barnett, Ashley, Primoratz, Tamar, Kruger, Ariel, Karvetski, Christopher W., Stone, Benjamin, Diamond, Michael L., Saletta, Morgan, van Gelder, Tim, Tetlock, Philip E. and Dennis, Simon
Abstract

In this paper we investigate the criterion validity of forced-choice comparisons of the quality of written arguments with normative solutions. Across two studies, novices and experts assessing quality of reasoning through a forced-choice design were both able to choose arguments supporting more accurate solutions—62.2% (SE = 1%) of the time for novices and 74.4% (SE = 1%) for experts—and arguments produced by larger teams—up to 82% of the time for novices and 85% for experts—with high inter-rater reliability, namely 70.58% (95% CI = 1.18) agreement for novices and 80.98% (95% CI = 2.26) for experts. We also explored two methods for increasing efficiency. We found that the number of comparative judgments needed could be substantially reduced with little accuracy loss by leveraging transitivity and producing quality-of-reasoning assessments using an AVL tree method. Moreover, a regression model trained to predict scores based on automatically derived linguistic features of participants’ judgments achieved a high correlation with the objective accuracy scores of the arguments in our dataset. Despite the inherent subjectivity involved in evaluating differing quality of reasoning, the forced-choice paradigm allows even novice raters to perform beyond chance and can provide a valid, reliable, and efficient method for producing quality-of-reasoning assessments at scale.

Keywordsreasoning; quality of reasoning; comparative judgment; forced choice; automatic reasoning assessment
Year2023
JournalBehavior Research Methods
Journal citationpp. 1-16
PublisherSpringer
ISSN0743-3808
Digital Object Identifier (DOI)https://doi.org/10.3758/s13428-023-02234-x
PubMed ID37833511
Scopus EID2-s2.0-85174061724
Open accessPublished as ‘gold’ (paid) open access
Page range1-16
FunderOffice of the Director of National Intelligence (ODNI), United States of America
Intelligence Advanced Research Projects Activity (IARPA), United States of America
The British Academy
The Leverhulme Trust
Publisher's version
License
File Access Level
Open
Output statusIn press
Publication dates
Online13 Oct 2023
Publication process dates
Accepted02 Sep 2023
Deposited27 Nov 2023
Grant ID16122000002
SRG2223\231699
Permalink -

https://acuresearchbank.acu.edu.au/item/8zzq0/validating-a-forced-choice-method-for-eliciting-quality-of-reasoning-judgments

Download files


Publisher's version
OA_Marcoci_2023_Validating_a_forced_choice_method_for.pdf
License: CC BY 4.0
File access level: Open

  • 36
    total views
  • 15
    total downloads
  • 2
    views this month
  • 3
    downloads this month
These values are for the period from 19th October 2020, when this repository was created.

Export as

Related outputs

Research Through the Eyes of Teachers
Rowe, Luke and Hattie, John. (2023). Research Through the Eyes of Teachers. In In Their Own Words: What Scholars and Teachers WantYou to Know About Why and Howto Apply the Science of Learning inYour Academic Setting pp. 44-60 Society for the Teaching of Psychology.
What do secondary teachers think about digital games for learning : Stupid fixation or the future of education?
Gutierrez, Amanda, Mills, Kathy, Scholes, Laura, Rowe, Luke and Pink, Elizabeth. (2023). What do secondary teachers think about digital games for learning : Stupid fixation or the future of education? Teaching and Teacher Education. 133, p. Article 104278. https://doi.org/10.1016/j.tate.2023.104278
Spiritual and Pedagogical Accompaniment (SPA) program 2022
Gutierrez, Amanda and Rowe, Luke. (2023). Spiritual and Pedagogical Accompaniment (SPA) program 2022 Brisbane, Queensland: Australian Catholic University.
Spiritual and Pedagogical Accompaniment (SPA) program (2019-2021)
Gutierrez, Amanda and Rowe, Luke. (2022). Spiritual and Pedagogical Accompaniment (SPA) program (2019-2021) Brisbane, Queensland: Australian Catholic University.
Video gaming and digital competence among elementary school students
Scholes, Laura, Rowe, Luke, Mills, Kathy A., Gutierrez, Amanda and Pink, Elizabeth. (2022). Video gaming and digital competence among elementary school students. Learning, Media and Technology. pp. 1-16. https://doi.org/10.1080/17439884.2022.2156537
autopsych : An R Shiny tool for the reproducible Rasch analysis, differential item functioning, equating, and examination of group effects
Courtney, Matthew G.R., Chang, Kevin C.T., Mei, Bing, Meissel, Kane, Rowe, Luke and Issayeva, Laila B.. (2021). autopsych : An R Shiny tool for the reproducible Rasch analysis, differential item functioning, equating, and examination of group effects. PLoS ONE. 16(10), p. e0257682. https://doi.org/10.1371/journal.pone.0257682
g versus c : comparing individual and collective intelligence across two meta-analyses
Rowe, Luke I., Hattie, John and Hester, Robert. (2021). g versus c : comparing individual and collective intelligence across two meta-analyses. Cognitive Research: Principles and Implications. 6(1), p. Article 26. https://doi.org/10.1186/s41235-021-00285-2
Metacognition and self‑regulated learning
Rowe, Luke and Kang, Sean. (2019). Metacognition and self‑regulated learning Australia: Evidence for Learning.
Open dialogue peer review : A response to Claxton & Lucas
Hattie, John, Clinton, Janet and Rowe, Luke. (2016). Open dialogue peer review : A response to Claxton & Lucas. Psychology of Education Review. 40(1), pp. 30-37. https://doi.org/10.53841/bpsper.2016.40.1.30