Poisson binomial distribution
| Poisson binomial | |||
|---|---|---|---|
| Parameters | β success probabilities for each of the n trials | ||
| Support | k β { 0, β¦, n } | ||
| PMF | |||
| CDF | |||
| Mean | |||
| Variance | |||
| Skewness | |||
| Excess kurtosis | |||
| MGF | |||
| CF | |||
| PGF | |||
In probability theory and statistics, the Poisson binomial distribution is the discrete probability distribution of a sum of independent Bernoulli trials that are not necessarily identically distributed. The concept is named after SimΓ©on Denis Poisson.
In other words, it is the probability distribution of the number of successes in a collection of n independent yes/no experiments with success probabilities . The ordinary binomial distribution is a special case of the Poisson binomial distribution, when all success probabilities are the same, that is .
Definitions
[edit | edit source]Probability mass function
[edit | edit source]The probability of having k successful trials out of a total of n can be written as the sum [1]
where is the set of all subsets of k integers that can be selected from . For example, if n = 3, then . is the complement of , i.e. .
will contain elements, the sum over which is infeasible to compute in practice unless the number of trials n is small (e.g. if n = 30, contains over 1020 elements). However, there are other, more efficient ways to calculate .
As long as none of the success probabilities are equal to one, one can calculate the probability of k successes using the recursive formula [2] [3]
where
The recursive formula is not numerically stable, and should be avoided if is greater than approximately 20.
An alternative is to use a divide-and-conquer algorithm: if we assume is a power of two, denoting by the Poisson binomial of and the convolution operator, we have .
More generally, the probability mass function of a Poisson binomial can be expressed as the convolution of the vectors where . This observation leads to the direct convolution (DC) algorithm for computing through :
// PMF and nextPMF begin at index 0
function DC() is
declare new PMF array of size 1
PMF[0] = [1]
for i = 1 to do
declare new nextPMF array of size i + 1
nextPMF[0] = (1 - ) * PMF[0]
nextPMF[i] = * PMF[i - 1]
for k = 1 to i - 1 do
nextPMF[k] = * PMF[k - 1] + (1 - ) * PMF[k]
repeat
PMF = nextPMF
repeat
return PMF
end function
will be found in PMF[k]. DC is numerically stable, exact, and, when implemented as a software routine, exceptionally fast for . It can also be quite fast for larger , depending on the distribution of the .[4]
Another possibility is using the discrete Fourier transform.[5]
where and .
Still other methods are described in "Statistical Applications of the Poisson-Binomial and conditional Bernoulli distributions" by Chen and Liu[6] and in "A simple and fast method for computing the Poisson binomial distribution function" by Biscarri et al.[4]
Cumulative distribution function
[edit | edit source]The cumulative distribution function (CDF) can be expressed as:
where is the set of all subsets of size that can be selected from .
It can be computed by invoking the DC function above, and then adding elements through of the returned PMF array.
Properties
[edit | edit source]Mean and variance
[edit | edit source]Since a Poisson binomial distributed variable is a sum of n independent Bernoulli distributed variables, its mean and variance will simply be sums of the mean and variance of the n Bernoulli distributions:
Entropy
[edit | edit source]There is no simple formula for the entropy of a Poisson binomial distribution, but the entropy is bounded above by the entropy of a binomial distribution with the same number parameter and the same mean. Therefore, the entropy is also bounded above by the entropy of a Poisson distribution with the same mean.[7]
The SheppβOlkin concavity conjecture, due to Lawrence Shepp and Ingram Olkin in 1981, states that the entropy of a Poisson binomial distribution is a concave function of the success probabilities .[8] This conjecture was proved by Erwan Hillion and Oliver Johnson in 2015.[9] The SheppβOlkin monotonicity conjecture, also from the same 1981 paper, is that the entropy is monotone increasing in , if all . This conjecture was also proved by Hillion and Johnson, in 2019.[10]
Chernoff bound
[edit | edit source]The probability that a Poisson binomial distribution gets large, can be bounded using its moment generating function as follows (valid when and for any ):
where we took . This is similar to the tail bounds of a binomial distribution.
Related distribution
[edit | edit source]Approximation by binomial distribution
[edit | edit source]A Poisson binomial distribution can be approximated by a binomial distribution where , the mean of the , is the success probability of . The variances of and are related by the formula
As can be seen, the closer the are to , that is, the more the tend to homogeneity, the larger 's variance. When all the are equal to , becomes , , and the variance is at its maximum.[1]
Ehm has determined bounds for the total variation distance of and , in effect providing bounds on the error introduced when approximating with . Let and be the total variation distance of and . Then
where .
tends to 0 if and only if tends to 1.[11]
Approximation by Poisson distribution
[edit | edit source]A Poisson binomial distribution can also be approximated by a Poisson distribution with mean . Barbour and Hall have shown that
where is the total variation distance of and .[12] It can be seen that the smaller the , the better approximates .
As and , ; so a Poisson binomial distribution's variance is bounded above by a Poisson distribution with , and the smaller the , the closer will be to .
Computational methods
[edit | edit source]The reference [13] discusses techniques of evaluating the probability mass function of the Poisson binomial distribution. The following software implementations are based on it:
- An R package poibin was provided along with the paper,[13] which is available for the computing of the cdf, pmf, quantile function, and random number generation of the Poisson binomial distribution. For computing the PMF, a DFT algorithm or a recursive algorithm can be specified to compute the exact PMF, and approximation methods using the normal and Poisson distribution can also be specified.
- poibin β Python implementation β can compute the PMF and CDF, uses the DFT method described in the paper for doing so.
See also
[edit | edit source]Lua error in mw.title.lua at line 392: bad argument #2 to 'title.new' (unrecognized namespace name 'Portal').
References
[edit | edit source]- ^ a b Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
- ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
- ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
- ^ a b Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
- ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
- ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
- ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
- ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
- ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
- ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
- ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
- ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
- ^ a b Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).