/

# Quenched large deviation principle for words in a letter sequence

Matthias Birkner, Andreas Greven, Frank den Hollander
Arxiv ID: 0807.2611Last updated: 10/5/2022
When we cut an i.i.d. sequence of letters into words according to an independent renewal process, we obtain an i.i.d. sequence of words. In the \emph{annealed} large deviation principle (LDP) for the empirical process of words, the rate function is the specific relative entropy of the observed law of words w.r.t. the reference law of words. In the present paper we consider the \emph{quenched} LDP, i.e., we condition on a typical letter sequence. We focus on the case where the renewal process has an \emph{algebraic} tail. The rate function turns out to be a sum of two terms, one being the annealed rate function, the other being proportional to the specific relative entropy of the observed law of letters w.r.t. the reference law of letters, with the former being obtained by concatenating the words and randomising the location of the origin. The proportionality constant equals the tail exponent of the renewal process. Earlier work by Birkner considered the case where the renewal process has an exponential tail, in which case the rate function turns out to be the first term on the set where the second term vanishes and to be infinite elsewhere. The previous version (arXiv:0807.2611v2) appeared in Probab. Theory Relat. Fields 148, no. 3/4 (2010), 403--456. Meanwhile, it has turned out that the original proof of the representation of the rate function is flawed when the mean word length is infinite. We add an erratum in which we fix the flaw in the proof. Along the way we derive new representations of the rate function that are interesting in their own right. A key ingredient in the proof is the observation that if the rate function in the annealed large deviation principle is finite at a stationary word process, then the letters in the tail of the long words in this process are typical.