Who Knows

Kevin Lewis

February 06, 2024

A framework for quantifying individual and collective common sense
Mark Whiting & Duncan Watts
Proceedings of the National Academy of Sciences, 23 January 2024

Abstract:
The notion of common sense is invoked so frequently in contexts as diverse as everyday conversation, political debates, and evaluations of artificial intelligence that its meaning might be surmised to be unproblematic. Surprisingly, however, neither the intrinsic properties of common sense knowledge (what makes a claim commonsensical) nor the degree to which it is shared by people (its “commonness”) have been characterized empirically. In this paper, we introduce an analytical framework for quantifying both these elements of common sense. First, we define the commonsensicality of individual claims and people in terms of the latter’s propensity to agree on the former and their awareness of one another’s agreement. Second, we formalize the commonness of common sense as a clique detection problem on a bipartite belief graph of people and claims, defining PQ common sense as the fraction Q of claims shared by a fraction P of people. Evaluating our framework on a dataset of 2,046 raters evaluating 4,407 diverse claims, we find that commonsensicality aligns most closely with plainly worded, fact-like statements about everyday physical reality. Psychometric attributes such as social perceptiveness influence individual common sense, but surprisingly demographic factors such as age or gender do not. Finally, we find that collective common sense is rare: At most, a small fraction P of people agree on more than a small fraction Q of claims. Together, these results undercut universalistic beliefs about common sense and raise questions about its variability that are relevant both to human and artificial intelligence.

Probabilistic Outcomes Are Valued Less in Expectation, Even Conditional on Their Realization
Gabriele Paolacci & Quentin André
Management Science, forthcoming

Abstract:
Most theories of decision making under risk assume that payoffs and probabilities are separable. In the context of a lottery, the subjective value of a prospective outcome (the payoff) is assumed to be independent of the likelihood that the outcome will occur (the probability). In violation of this assumption, we present eight experiments showing that people anticipate less utility from uncertain outcomes than from certain outcomes, even conditional on their realization. The devaluation of uncertain outcomes is observed across different measures of utility (willingness to spend money or time; choice between different options), different populations (student and online samples), and different manipulations of uncertainty. We show that this result does not simply reflect a misunderstanding of the instructions or people’s aversion toward a “weird” transaction with unexplained features. We highlight the implications of this phenomenon for empirical investigations of risk preferences and conclude with a discussion of the psychological mechanisms that might drive the devaluation of probabilistic outcomes.

Greater variability in judgements of the value of novel ideas
Wayne Johnson & Devon Proudfoot
Nature Human Behaviour, forthcoming

Abstract:
Understanding the factors that hinder support for creative ideas is important because creative ideas fuel innovation -- a goal prioritized across the arts, sciences and business. Here we document one obstacle faced by creative ideas: as ideas become more novel -- that is, they depart more from existing norms and standards -- disagreement grows about their potential value. Specifically, across multiple contexts, using both experimental methods (four studies, total n = 1,801) and analyses of archival data, we find that there is more variability in judgements of the value of more novel (versus less novel) ideas. We also find that people interpret greater variability in others’ judgements about an idea’s value as a signal of risk, reducing their willingness to invest in the idea. Our findings show that consensus about an idea’s worth diminishes the newer it is, highlighting one reason creative ideas may fail to gain traction in the social world.

Updating Beliefs Based on Observed Performance: Evidence From NFL Head Coaches
Michael Roach & Mark Owens
Journal of Sports Economics, forthcoming

Abstract:
We utilize play-by-play data from the National Football League to examine coaching decisions on fourth down and how sensitive they are to information on situational success and their competitive environment. Prior fourth down successes and failures within a game influence coaches in a way consistent with the notion that recent information is more salient to these coaches when making decisions and a belief in in-game momentum. Coaches are more sensitive to fourth down failures than successes, and our findings suggest this sensitivity to prior failures leads to suboptimal fourth down decisions later in the game. This finding is generally driven by the behavior of coaches with a background in coaching offense, suggesting the availability heuristic is particularly potent for managers who are more involved and, perhaps, more accountable for the details of fourth down plays. We suspect these patterns are prevalent in a wide range of managerial contexts.

Umpire Home Bias in Major League Baseball
Mike Hsu
Journal of Sports Economics, forthcoming

Abstract:
This paper studies whether Major League Baseball umpires displayed home bias in their pitch calls, using data on pitch call accuracy from the 2010–2019 seasons to isolate evaluator bias from player performances. The main findings are consistent with umpire home bias, as home batters on average received more called balls on actual ball pitches and fewer called strikes on actual strike pitches, which work in their favor. The bias is not entirely explained by umpire, player, or stadium characteristics, nor is it attributable to umpiring inconsistencies.

Crowd prediction systems: Markets, polls, and elite forecasters
Pavel Atanasov et al.
International Journal of Forecasting, forthcoming

Abstract:
What systems should we use to elicit and aggregate judgmental forecasts? Who should be asked to make such forecasts? We address these questions by assessing two widely used crowd prediction systems: prediction markets and prediction polls. Our main test compares a prediction market against team-based prediction polls, using data from a large, multi-year forecasting competition. Each of these two systems uses inputs from either a large, sub-elite or a small, elite crowd. We find that small, elite crowds outperform larger ones, whereas the two systems are statistically tied. In addition to this main research question, we examine two complementary questions. First, we compare two market structures -- continuous double auction (CDA) markets and logarithmic market scoring rule (LMSR) markets -- and find that the LMSR market produces more accurate forecasts than the CDA market, especially on low-activity questions. Second, given the importance of elite forecasters, we compare the talent-spotting properties of the two systems and find that markets and polls are equally effective at identifying elite forecasters. Overall, the performance benefits of “superforecasting” hold across systems. Managers should move towards identifying and deploying small, select crowds to maximize forecasting performance.

Can’t wait to pay: The desire for goal closure increases impatience for costs
Annabelle Roberts, Alex Imas & Ayelet Fishbach
Journal of Personality and Social Psychology, forthcoming

Abstract:
We explore whether the desire to achieve psychological closure on a goal creates impatience. If so, people should choose an earlier (vs. later) option, even when it does not deliver a reward. For example, they may prefer to pay money or complete work earlier rather than later. A choice to incur earlier costs seems to violate the preference for positive discounting (indeed, it may appear like negative time discounting), unless people value earlier goal closure. Across seven studies, we consistently find that people preferred to pay more money sooner over less money later (Study 1) and complete more work sooner over less work later (Studies 2–5) more when they had a stronger desire for goal closure, such as when the sooner option allowed them to achieve goal closure and when the goal would otherwise linger on their minds (compared to when it would not). The implications of goal closure extend to impatience for gains (Studies 6–7), as people preferred less money sooner (vs. more later) when it allowed them to achieve goal closure. These findings suggest that the desire to achieve goal closure is an important aspect of time preferences. Taking this desire into account can explain marketplace anomalies and inform interventions to reduce impatience.

Impatience Over Time
Annabelle Roberts & Ayelet Fishbach
Social Psychological and Personality Science, forthcoming

Abstract:
Waiting is ubiquitous yet painful. We find that the discomfort of waiting intensifies as the wait draws closer to its end. Using longitudinal studies that measured impatience for real-world events, we documented greater impatience closer to learning the results of the 2020 U.S. presidential election (Study 1), receiving the first COVID-19 vaccine (Study 2), and boarding a bus (Study 3). Follow-up experiments found that a desire for closure underlies this effect, and that impatience increases at the end of the wait controlling for how long people have already been waiting (Supplemental Studies 1–4). These findings suggest that the distress of waiting escalates when the wait is almost over.

Solving olympiad geometry without human demonstrations
Trieu Trinh et al.
Nature, 18 January 2024, Pages 476-482

Abstract:
Proving mathematical theorems at the olympiad level represents a notable milestone in human-level automated reasoning, owing to their reputed difficulty among the world’s best talents in pre-university mathematics. Current machine-learning approaches, however, are not applicable to most mathematical domains owing to the high cost of translating human proofs into machine-verifiable format. The problem is even worse for geometry because of its unique translation challenges, resulting in severe scarcity of training data. We propose AlphaGeometry, a theorem prover for Euclidean plane geometry that sidesteps the need for human demonstrations by synthesizing millions of theorems and proofs across different levels of complexity. AlphaGeometry is a neuro-symbolic system that uses a neural language model, trained from scratch on our large-scale synthetic data, to guide a symbolic deduction engine through infinite branching points in challenging problems. On a test set of 30 latest olympiad-level problems, AlphaGeometry solves 25, outperforming the previous best method that only solves ten problems and approaching the performance of an average International Mathematical Olympiad (IMO) gold medallist. Notably, AlphaGeometry produces human-readable proofs, solves all geometry problems in the IMO 2000 and 2015 under human expert evaluation and discovers a generalized version of a translated IMO theorem in 2004.

Don't waste your time measuring intelligence: Further evidence for the validity of a three-minute speeded reasoning test
Anna-Lena Schubert et al.
Intelligence, January-February 2024

Abstract:
The rise of large-scale collaborative panel studies has generated a need for fast, reliable, and valid assessments of cognitive abilities. In these studies, a detailed characterization of participants' cognitive abilities is often unnecessary, leading to the selection of tests based on convenience, duration, and feasibility. This often results in the use of abbreviated measures or proxies, potentially compromising their reliability and validity. Here we evaluate the mini-q (Baudson & Preckel, 2016), a three-minute speeded reasoning test, as a brief assessment of general cognitive abilities. The mini-q exhibited excellent reliability (0.96–0.99) and a substantial correlation with general cognitive abilities measured with a comprehensive test battery (r = 0.57; age-corrected r = 0.50), supporting its potential as a brief screening of cognitive abilities. Working memory capacity accounted for the majority (54%) of the association between test performance and general cognitive abilities, whereas individual differences in processing speed did not contribute to this relationship. Our results support the notion that the mini-q can be used as a brief, reliable, and valid assessment of general cognitive abilities. We therefore developed a computer-based version, ensuring its adaptability for large-scale panel studies. The paper- and computer-based versions demonstrated scalar measurement invariance and can therefore be used interchangeably. We provide norm data for young (18 to 30 years) and middle-aged (31 to 60 years) adults and provide recommendations for incorporating the mini-q in panel studies. Additionally, we address potential challenges stemming from language diversity, wide age ranges, and online testing in such studies.

number 64 • Summer 2025

Findings

Who Knows

Kevin Lewis

February 06, 2024

Worth Fighting For

Kevin Lewis

Legacy of Control

Kevin Lewis

Insight

Archives

A weekly newsletter with free essays from past issues of National Affairs and The Public Interest that shed light on the week's pressing issues.

Sign-in to your National Affairs subscriber account.

Already a subscriber? Activate your account.

subscribe

Unlimited access to intelligent essays on the nation’s affairs.