Finance


A Test of Lookahead Bias in LLM Forecasts

arXiv     SSRN     slides

Authors (in alphabetical order): Zhenyu Gao, Wenxi Jiang, Yutong Yan

Keywords: Large Language Models · Lookahead Bias · Financial Forecasting · Asset Pricing

Abstract We develop a statistical test to detect lookahead bias in economic forecasts generated by large language models (LLMs). Leveraging state-of-the-art pretraining data detection techniques, we estimate the likelihood that a given prompt appeared in an LLM’s training corpus—a statistic we term Lookahead Propensity (LAP). We formally show that a positive correlation between LAP and forecast accuracy indicates both the presence and magnitude of lookahead bias. We apply the test to two forecasting settings: news headlines predicting stock returns and earnings call transcripts predicting capital expenditures. Our approach provides a cost-efficient diagnostic tool for assessing the validity and reliability of LLM-generated forecasts.

DatedGPT: Preventing Lookahead Bias in Large Language Models with Time-Aware Pretraining

Poster     Draft Coming Soon    

Chat now    

Authors: Yutong Yan, Raphael Tang

Keywords: Large Language Models · Lookahead Bias · Time-Aware Pretraining

Presentations: AFA Poster Session (2026)

Abstract Large language models (LLMs) exhibit significant lookahead bias when future-trained data contaminates historical predictions, violating temporal causality. We present DatedGPT, the largest time-aware model family to date (1.3B parameters across 12 models), designed to eliminate this bias through strict temporal pretraining. Our framework trains separate GPT-3-XL–scale models from scratch on annually segmented Common Crawl data (2013–2024), enforcing causal data boundaries. Benchmark evaluations show progressive performance improvements over time, while ablation studies confirm temporal integrity: pre-2020 models assign near-zero probability to future events such as "COVID-19" or "Joe Biden presidency" in contextually appropriate prompts—surfacing only after the relevant timelines. By preventing future information leakage, DatedGPT enables historically faithful predictions for applications in finance, economics, and longitudinal research. A public demo of DatedGPT is available at https://datedgpt.com.

Deciphering Green Preferences and Climate Risk Perceptions: An NLP Approach

Draft available upon request    

Authors (in alphabetical order): Darwin Choi, Zhenyu Gao, Wenxi Jiang, Yutong Yan, and Hulai Zhang

Keywords: Climate Finance · Institutional Investors · ESG · Textual Analysis · Natural Language Processing

Presentations: European Sustainable Finance PhD Workshop (2025)

Abstract We employ Natural Language Processing (NLP) to scrutinize regulatory filings, identifying institutional investors’ climate change preferences and risk perceptions. These preferences and risk perceptions grow over time and are stronger after a fund has signed the Principles for Responsible Investment (PRI) or is located in regions with stronger global warming beliefs. Investors preferring green assets tend to decrease their portfolio weights in environmentally unfriendly stocks, reflecting a desire to align investments with their values. However, the relationship between climate risk perceptions and portfolio weights of brown stocks varies due to heterogeneous investment strategies. Investors with higher climate risk perceptions are more likely to support environmental shareholder proposals, whereas investors with green preferences are not. These findings provide new insights into sustainable investing behavior under differing investor motivations.

Machine Learning


Bandit Algorithms for Factorial Experiments

PDF     Poster     Slides
Authors: Yutong Yan, Audrey Durand, Joelle Pineau
Keywords: Machine Learning, Optimization
Presentations: WiML Workshop, Conference on Neural Information Processing Systems 2019

Abstract A multi-armed bandit algorithm is developed for factorial experiments. Using tools from advanced probability theory, I first prove that UCT algorithm with Laplace bound has a lower computational complexity than the naïve UCT algorithm. I begin by analyzing UCB1 for non-stationary bandit problems, and then prove UCT algorithm with Laplace Bounds achieves a better lower bound. Also, I demonstrate that the probability of suboptimal choices will converge to zero with a convergence of failure probability. In settings of deep learning, experimental results are also consistent with the theoretical regret bound.

A Theoretical Analysis of Upper Confidence Bound applied to Trees

PDF     Slides
Authors: Yutong Yan, Audrey Durand, Joelle Pineau
Keywords: Machine Learning, Optimization

Abstract Using Factorial experiments, I explore multi-armed bandit problems in which a player selects actions (here a sequence) episodically and observes the outcomes. I consider the Upper Confidence Bound applied to Trees (UCT), a popular algorithm for tree search, in order to identify the sequence of choices that maximizes some objective function. Using synthetic experiments, I demonstrate that applying tighter concentration bounds to Linear Bandits can significantly improve the performance of UCT for tree search. Next step is to investigate various factorial experimental design configurations. I also compare the performance of algorithms under three different formulations of the factorial experiment: 1) standard bandits; 2) linear bandits; and 3) bandits for tree search. I observe that capturing the underlying tree structure is essential for robustness, whether the outcome function is linear or not. Furthermore, I observe that the algorithms employed under the bandits formulation of tree search for factorial experimental designs appear more robust to the noise variance than other approaches.