Search Ask

Makeroom

RegisterLogin

Discussion

General
Tech
Photos

Library

Chevron Right Icon
Design
Resources
Websites
Chevron Right Icon
Web development
Cool Libraries
Tools
Resources
Papers and Studies
Language Models
Chevron Right Icon
Computers
Chevron Right Icon
Windows Tools and Modding
Windhawk
Raycast for Windows
Rainmeter
Chevron Right Icon
Random fun stuff
Esoteric File Systems
Cool websites
Chevron Right Icon
Storyden
Selfh.st
OpenAlternative
Microlaunch
Peerlist
Glama.ai
AlternativeTo
Brandfetch
PitchHut
 Collections Links Members

Makeroom

Icon

A small rag-tag assortment of makers, engineers and designers sharing mentoring, support and projects to work on at any stage in their career.

Join our Discord server!


Welcome to the Makeroom installation of Storyden!

This acts as a live demo of Storyden's forum and library software. On this site you'll find a curated collection of web and design resources as well as anything our members share.

Feel free to participate, this may be a demo but it's never wiped. That being said, Storyden is in active development and we encourage you to experiment respectfully as well as report any security issues you find to @Southclaws or by opening an issue.

Have an amazing day!

powered by storyden

Login
Library
llms-are-bayesian-in-expectation-not-in-realization

LLMs are Bayesian, in Expectation, not in Realization

Large language models (LLMs) exhibit a striking ability to learn and adapt to new tasks on the fly—commonly referred to as in-context learning. Without modifying their underlying parameters, these models can generalize from just a few examples, a trait that has drawn comparisons to implicit Bayesian updating. However, emerging research highlights a key theoretical contradiction: transformers, the foundational architecture behind most LLMs, consistently violate the martingale property—a central requirement for Bayesian reasoning when dealing with exchangeable data. This insight invites a deeper exploration into the mathematical assumptions underlying modern AI systems, particularly in scenarios where quantifiable uncertainty is essential.

A recent theoretical analysis offers a nuanced view into the inner workings of transformers and how they handle uncertainty. Among the findings, positional encodings—integral to transformers’ sequence awareness—are shown to be a primary source of deviation from Bayesian expectations, leading to martingale violations of logarithmic order. Interestingly, despite these violations, the models still achieve information-theoretic optimality in expected prediction risk. This suggests that transformers are not failing to reason probabilistically, but rather are operating via a fundamentally different, yet still highly efficient, mechanism.

One particularly notable contribution is the derivation of the optimal length for chain-of-thought reasoning, presenting a concrete formula that balances computational cost with inference quality. Empirical tests on GPT-3 align closely with these theoretical predictions, showing near-perfect entropy efficiency within only 20 demonstration examples. These insights not only strengthen our understanding of how LLMs process new information, but also introduce practical tools for improving performance and reliability in real-world deployments. For developers and researchers alike, these findings present a compelling framework to rethink model interpretability, inference costs, and uncertainty quantification.

LLMs are Bayesian, in Expectation, not in Realization

Large language models demonstrate remarkable in-context learning capabilities, adapting to new tasks without parameter updates. While this phenomenon has been successfully modeled as implicit Bayesian inference, recent empirical findings reveal a fundamental contradiction: transformers systematically violate the martingale property, a cornerstone requirement of Bayesian updating on exchangeable data. This violation challenges the theoretical foundations underlying uncertainty quantification in critical applications. Our theoretical analysis establishes four key results: (1) positional encodings induce martingale violations of order $Θ(\log n / n)$; (2) transformers achieve information-theoretic optimality with excess risk $O(n^{-1/2})$ in expectation over orderings; (3) the implicit posterior representation converges to the true Bayesian posterior in the space of sufficient statistics; and (4) we derive the optimal chain-of-thought length as $k^* = Θ(\sqrt{n}\log(1/\varepsilon))$ with explicit constants, providing a principled approach to reduce inference costs while maintaining performance. Empirical validation on GPT-3 confirms predictions (1)-(3), with transformers reaching 99\% of theoretical entropy limits within 20 examples. Our framework provides practical methods for extracting calibrated uncertainty estimates from position-aware architectures and optimizing computational efficiency in deployment.

arxiv.org