A Specification Curve Evaluation of NLP Classifiers for Mechanisms influencing Decisions under Risk and Uncertainty
Summary
This project investigates whether state-specific psychological mechanisms influencing decision-making can be identified through natural language processing (NLP) of think-aloud protocols (TAPs). To test this, participants were exposed to experimentally manipulated decision contexts, and their verbal reports were analyzed using a range of classification strategies, including sentence embeddings (SBERT) with subsequent random forest classification, semantic similarity, prompt-based decoder models (Deepseek R1), and hybrid approaches. The study systematically explores variations in data processing, model choice, and comparison type employing a specification curve analysis approach, and evaluates influence of configurations on performance using AUC-ROC and standardized entropy (h-score) within a Bayesian framework. While classification performance varied substantially across configurations, models based on Deepseek R1 and Random Forest consistently outperformed similarity-based approaches, particularly when participants reported perceiving the experimental manipulation. The findings highlight NLP’s potential for scalable, interpretable assessment of decision-making mechanisms based on natural language as data, while also discussing the importance of model selection, data quality, and practical considerations in implementation of NLP classification.
Code
If you are interested in the systematic analysis and evaluation of the results, I provide my code here: Code
Dissemnination
Explore the 3’900 specifications of this project in this Shiny-App!