Language Generation in the Limit: Noise, Loss, and Feedback
Kleinberg and Mullainathan (2024) recently proposed a formal framework called language generation in the limit and showed that given a sequence of example strings from an unknown target language drawn from any countable collection, an algorithm can correctly generate unseen strings from the target language within finite time. This notion of language generation was further refined by Li, Raman, and Tewari (2025), who defined progressively stricter categories called non-uniform and uniform generation within generation in the limit. They showed that a finite union of uniformly generatable collections is generatable in the limit, and asked if the same is true for non-uniform generation and generation in the limit. Our starting point in this paper is to resolve the question of Li, Raman, and Tewari in the negative: we give a uniformly generatable collection and a non-uniformly generatable collection whose union is not generatable in the limit. We then use facets of this construction to further our understanding of several variants of language generation. The first two, language generation with noise and without samples, were introduced by Raman and Raman (2025) and Li, Raman, and Tewari (2025) respectively. We show the equivalence of these models, for both uniform and non-uniform generation. We also provide a complete characterization of non-uniform noisy generation, complementing the corresponding result of Raman and Raman (2025) for uniform noisy generation. The former paper asked if there is any separation between noisy and non-noisy generation in the limit—we show that such a separation exists even with a single noisy string. Finally, we study the framework of generation with feedback, introduced by Charikar and Pabbaraju (2025), where the algorithm is strengthened by allowing it to ask membership queries. We draw a sharp distinction between finite and infinite queries: we show that the former gives no extra power, but the latter is closed under countable union, making it a strictly more powerful model than language generation without feedback. In summary, the results in this paper resolve the union-closedness of language generation in the limit, and leverage those techniques (and others) to give precise characterizations for natural variants that incorporate noise, loss, and feedback in language generation.