A Specification Curve Evaluation of NLP Classifiers for Mechanisms influencing Decisions under Risk and Uncertainty

LLM

NLP

specificationcurve

We compared 3’900 algorithmic specifications utilizing NLP and Machine Learning to identify state-specific decision mechanisms from think-aloud protocols.

Author

Sabou Rani Stocker

Published

September 28, 2025

Summary

This project investigates whether state-specific psychological mechanisms influencing decision-making can be identified through natural language processing (NLP) of think-aloud protocols (TAPs). To test this, participants were exposed to experimentally manipulated decision contexts, and their verbal reports were analyzed using a range of classification strategies, including sentence embeddings (SBERT) with subsequent random forest classification, semantic similarity, prompt-based decoder models (Deepseek R1), and hybrid approaches. The study systematically explores variations in data processing, model choice, and comparison type employing a specification curve analysis approach, and evaluates influence of configurations on performance using AUC-ROC and standardized entropy (h-score) within a Bayesian framework. While classification performance varied substantially across configurations, models based on Deepseek R1 and Random Forest consistently outperformed similarity-based approaches, particularly when participants reported perceiving the experimental manipulation. The findings highlight NLP’s potential for scalable, interpretable assessment of decision-making mechanisms based on natural language as data, while also discussing the importance of model selection, data quality, and practical considerations in implementation of NLP classification.

Code

If you are interested in the systematic analysis and evaluation of the results, I provide my code here: Code

Dissemnination

Explore the 3’900 specifications of this project in this Shiny-App!