High throughput non-parametric probability density estimation via novel multithreaded  stitching method

Merino, Zach

High throughput non-parametric probability density estimation via novel multithreaded stitching method

Merino, Zach

2019

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DublinCore
EndNote
NLM
RefWorks
RIS

Add to Basket

Files

Abstract

A method of univariant probability density function (pdf) estimation is developed for big data applications. The method employs the use of a non-parametric maximum entropy estimator (NMEM) for a data driven multithreaded probability density estimation algorithm, which has been termed the stitching estimator (SE). The NMEM has previously shown to be a robust pdf estimator for high throughput applications, which has made it the ideal choice for the underlying estimator in the SE's algorithm. This work divides the estimation problem into many smaller estimation problems; termed blocks. The sample is partitioned into blocks by an optimized branching tree algorithm which has been developed to maximize the uniformity for the density of the data in every block. The algorithm finds pdf estimates for blocks using the NMEM then the estimates per block are combined through a stitching procedure that uses a weighted average which utilizes the cumulative probability density functions (cdf) for each pair of adjacent blocks. Further improvements are obtained by implementing a sub-sampling approach that generates sub-samples from the original sample without replacement. The pdfs from each sub-sample are then averaged to give a final estimate. The SE has been extensively benchmarked against a large set of diverse distributions for sample sizes ranging from of $2^9$ up to $2^{20}$ and 1000 trials per sample size. The quality of the estimates are quantified using scaled quantile residual (SQR) plots, which is a sample size invariant metric that is consistent with the Anderson-Darling test. The set of test distributions range from easy single mode distributions to extremely difficult exotic distributions. In all cases tested the SE yields excellent estimates with no need for a priori knowledge of the structure of the data.

Details

Title

High throughput non-parametric probability density estimation via novel multithreaded stitching method

Author

Merino, Zach (Applied Physics)

Contributor

ProQuest (Firm) Contributor
University of North Carolina at Charlotte Degree Granting Institution
Jacobs, Donald Thesis Advisor
Grabchak, Michael Committee Member
Porras-Aguilar, Rosario Committee Member

Date

2019

Publisher

University of North Carolina at Charlotte

Subjects

Mathematics
Statistics
Physics

Keywords

Adaptive; Multithreaded Algorithm; Nonparametric Estimation; Probability Density Estimation; Probability Density Function

Link to This Page

Handle: http://hdl.handle.net/20.500.13093/etd:1886

Publication Type

masters theses

Pagination

1 online resource (143 pages) : PDF

File Format

application/pdf

Degree Type

M.S.

Usage Statement

This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s). For additional information, see http://rightsstatements.org/page/InC/1.0/., (http://rightsstatements.org/page/InC/1.0/)
Copyright is held by the author unless otherwise indicated.

Record Appears in

Departments and Institutes > Applied Physics
Types > Masters Theses
Graduate Theses and Dissertations
Graduate Thesis and Dissertations

PDF

Statistics

Download Full History

High throughput non-parametric probability density estimation via novel multithreaded stitching method

Files

Abstract

Details

Related Items

PDF

Statistics