Asymptotic Normality of Higher Order Turing Formulae
1 online resource (133 pages) : PDF
University of North Carolina at Charlotte
Higher order Turing formulae, denoted as Tr for r ∈ Z+, are a powerful result allowing one to estimate the total probability associated with words from a random piece of writing, which have been observed exactly r times in a random sample. In particular T0 estimates the probability of seeing words not appearing in the sample. To perform statistical inference, e.g., constructing the asymptotic confidence intervals, the asymptotic properties of the higher Turing formulae need to be studied.In this thesis we extend the validity of the asymptotic normality beyond the previously proven cases by establishing a sufficient and necessary condition for the asymptotic normality of higher order Turing formulae when the underlying distribution is both fixed and changing. We then conduct simulation studies with the complete works of William Shakespeare and data generated from different underlying distributions to check the finite sample performance of the derived asymptotic confidence interval. Based on our theoretical results we also develop two methodologies for authorship detection with real twitter data analysis.
Asymptotic NormalityLindeberg-Feller Central Limit TheoremMissing MassOccupancy ProbabilitiesTuring Formula
Jiang, JianchengChristou, ElianaJacobs, Donald
Thesis (Ph.D.)--University of North Carolina at Charlotte, 2022.
This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s). For additional information, see http://rightsstatements.org/page/InC/1.0/.
Copyright is held by the author unless otherwise indicated.