TFQA: Tools for Quantitative Archaeology

TFQA: Tools for Quantitative Archaeology
kintigh@tfqa.com +1 (505) 395-7979

TFQA Home
TFQA Documentation
TFQA Orders
Kintigh (ASU Directory)

BINOMIAL: Compute Binomial Probabilities

Binomial calculates binomial probabilities and finds an interval of population parameters around the sample proportion. All output input is from the keyboard and output is to the screen.

PROGRAM EXAMPLE

File or Device for Output (<Enter> for Screen) {CON} ?

Pressing enter will send the output to thescreen, which is the typical use. Otherwise you canenter a file name.

[B]inomial Probabilities, [E]stimate, [Q]uit {B} ?

The program has two sections. In the first, straightforward [B]inomial probabilities are computed. While these probabilities have wide application in archaeology they are sometimes difficult to compute and are thus not used when they should be. In the second, a population parameter interval is [E]stimated around a sample proportion. Results of the first type of computation should require little explanation.

Binomial Probability:  P=C(n,k)*p^k*(1-p)^(n-k)
  Probability P of exactly k successes in n independent trials
  where the probability of success in each trial is p

Number of trials, n {0} ? 25
Number of successes, k {0} ? 4
Probability of success, p {0.0000} ? .1

Binomial Probability (k=4; n=25; p=0.1000)
     k |     P(k)      P(<=k)    P(>=k)
---------------------------------------
     0 |   0.071790  0.071790  1.000000
     1 |   0.199416  0.271206  0.928210
     2 |   0.265888  0.537094  0.728794
     3 |   0.226497  0.763591  0.462906
>    4 |   0.138415  0.902006  0.236409
     5 |   0.064594  0.966600  0.097994
...
    24 |   0.000000  1.000000  0.000000
    25 |   0.000000  1.000000  0.000000

Binomial Summary (k=4; n=25; p=0.1000)
  P(k)=0.138415   P(1-tailed)=0.236409   P(2-tailed)=0.308198
  P(<=k)=P(0)+P(1)+...+P(k) = 0.902006
  P(>=k)-P(k)+p(k+1)+...+P(n)=0.236409

That is, the probability of getting exactly 4 out of 25 given a population proportion of .1 is .14. The probability of getting 4 or fewer in a sample of 25 is .90; the probability of getting 4 or more is .24. (The latter two do not sum to 1 because p(4) is included in each sum).

Compute Another Probability {Y} ?

B]inomial Probabilities, [E]stimate, [Q]uit {B} ? E


Sample Size, n {0} ? 100
Number Observed In Sample, k {0} ? 4
Number of Population Proportions to Examine {10000} ? 
Display Tail Probs for Each Proportion Calculated {N} ? 

Best Estimate of Population Proportion, p-hat = 4/100 = 0.0400

         Prob in    Min p s.t.  Prob in   Max p s.t.  Prob in
  C.I.  Each Tail  RTail<=Prob   RTail   LTail<=Prob   LTail
  99.9%  0.0005     0.0035     (0.000)    0.1482     (0.000)
  99.8%  0.0010     0.0043     (0.000)    0.1402     (0.000)
  99.0%  0.0050     0.0068     (0.005)    0.1207     (0.005)
  98.0%  0.0100     0.0083     (0.010)    0.1118     (0.010)
  97.5%  0.0125     0.0088     (0.012)    0.1088     (0.012)
  95.0%  0.0250     0.0110     (0.025)    0.0993     (0.025)
  90.0%  0.0500     0.0137     (0.049)    0.0892     (0.050)
  80.0%  0.1000     0.0175     (0.099)    0.0784     (0.100)
  75.0%  0.1250     0.0190     (0.123)    0.0747     (0.124)
  50.0%  0.2500     0.0254     (0.250)    0.0621     (0.249)
   0.0%  0.5000     0.0365     (0.498)    0.0466     (0.499)
         p=p-hat => 0.0400     (0.571)    0.0400     (0.629)

Perform Another Estimate {Y} ?

Here the results are a bit less obvious. The problem is given an observed sample proportion of 4 out of 100, or 4%, what can we say about the population from which it was drawn. This question is examined by calculating a confidence interval. (However see the discussion below for what is generally a better approach.)

One way to do this is done by calculating binomial coefficients for different population proportions at set intervals. For example, if 100 population proportions are examined, then population proportions of 0.0, .01, .02, ... .99, 1.0 are examined.

The first part of the output lists the probability, for each proportion considered, of getting k or fewer successes (left tail) and k or more successes (right tail) out of n trials. Let us first consider a confidence interval of .98 level. Looking through the list, you can see that an actual proportion of .0083 will produce a 4 or more observed successes (the right tail) about 1% of the time. (Any higher proportion will produce 4 or more successes more than 1% of the time.) Looking further to the right on the 98% Confidence Interval line, we can see that a proportion of .1118 will produce 4 or fewer successes, about 1% of the time. Thus, the range from about .008 to .112 seems a reasonable candidate for a 98% confidence interval (with 1% in each tail) around the observed p-hat of 0.04.

The remainder of the output tabulates information so that you can find a number of reasonable confidence intervals around p-hat with approximately equal probabilities in each tail (assuming 1<k<n). Thus, if we look for .05 in each tail (a 90% interval), the probability that an actual population with a .0137 parameter would have a sample value as high or higher than the observed k of .04 is .040, the value in the right tail. Similarly, the probability that an actual population with a .0092 parameter would have a value as low or lower than .04 is .050, the probability of the left tail. Thus, only about 5% (3.3%) of the samples drawn from a population with a value of .014 have sample values as high as .04; about 5% (4.7%) drawn from a population with a value of .09 have sample values as low as .04. The difference between the probability level, i.e., 0.05 and the tail probabilities is due to the fact that only a restricted set of values are tested. If greater precision is required, increase the number of population proportions examined to 1000 or more.

Perform Another Estimate {Y} ?

The analysis can be repeated for other parameters; otherwise the program ends.

However, this is an indirect, and generally not the best way to compute a confidence interval for a binomial distribution. The BAYES program using a flat prior will provide the appropriate confidence intervals for proportions. For the same problem (k=4, n=100, resolution =.002) the 90% confidence interval is .014-.084.

Page Last Updated: 20 June 2022