Data ‘samples’ are by definition representatives of some unknown ‘population’ (e.g. surveys). The goal of density estimation is to model the distribution of data in the population based upon the distribution of the samples.

Large sample sets will more accurately reflect the population than small ones. Accurate estimation of population distributions from samples of limited size is an important but challenging problem.

Consider the sample data set on the left. We would like to estimate the population distribution from which these samples are derived. Most density-estimators are based upon one or more of the following techniques:

(a) Spatial binning methods.

(b) Nearest-neighbour methods.

(c) (Gaussian) mixture-models.

(d) Kernel-based methods.

Each of these techniques and their general properties are outlined over the following pages. [Continued...]