Tools for Quantitative Archaeology    Email:

BOONE: Boone's Measure of Assemblage Heterogeneity (Homogeneity)

Compute's Boone's (1987) measure of assemblage heterogeneity and some related heterogenity measures. Necessary formulae are presented in the program output; the measures are described below. The basic idea here is to compare the individual provenience units with the aggregated, or marginal distribution for all units combined. (In my view it is clearer to discuss class concentration or dominance or the deviation or difference between a unit and the site total, or equivalently, the row and the margin, than to talk about heterogeneity or homogeneity.) For all but one (BR) of the measures discussed below, larger values of the index indicate greater distance between the unit and the aggregate distribution (as proposed it is thus a measure of homogeneity, not it opposite, heterogeneity.)

The program first lists the original data and various transformations used in calculating the heterogeneity/homogeneity measures. Row percents for the original data and the marginal row percents (based on the original marginal counts) are printed. This table is used to calculate RMS(%), the square Root of the Mean (over the variables) of the Squared distances from the row percents to the marginal row percents (closely related to the Euclidean distance from each row to the marginal percents), yielding an average over the variables of the percent deviation (in percent units) of the row from the marginal percents.

After that, column-wise Z scores of the row percents (the previous table), are listed along with the Z score values of the marginal row percents in the previous table (which, contrary to what one might think are not 0, because the marginal percents, calculated from the marginal counts, are not the same as the mean percent). These are used to calculate the RMS(Z), analogous to the RMS(%), except it is calculated using Z scores. The effect here is to weight the variables equally, which is not done in the RMS(%), which is weighted by the variation (roughly the standard deviation) of each variable. Finally, the program lists adjusted row percents used to calculate Boone's measure per se. While Boone does not point this out, these are the row percents calculated for a matrix of column percents of the original data. These are the same as Boone's p's multiplied by 100. (It actually doesn't matter which variable one chooses as a standard in Boone's calculation.)

The final table reports case by case measures. Boone's H is first printed. The H values are 1/10000th of the sum of the squared differences between the observed adjusted row percents and the marginal adjusted percents (each of which is 100/nvar; the factor of 10,000 is introduced by doing the calculations in terms of percentages rather than proportions as Boone does). H is an assemblage heterogeneity measure expressed in units of proportions squared. H measures are relatively low (approaching 0) if classes are relatively evenly represented in the deposit, i.e, heterogeneous or similar to the marginal adjusted percentages and high where deposits greatly over-represent some classes and under-represents others on a proportional basis, i.e., those deposits will generally be comparatively more homogeneous (dominated one or a few classes). My own preference is to refer to H (and its relatives) as a measure of divergence from an empirical standard or a measure of concentration (bigger numbers mean an assemblage is more divergent, or has a more concentrated representation of artifact classes–see also the Koetje program–as I think the terminology is clearer.

I find Hs, a transformation of H, more useful that H because it scales H to percentage units. Hs is the square root of the mean of the squared differences between the observed adjusted percents and the marginal adjusted percents. While the range of Hs is always within 0-100, the upper limit is 100*(nvar-1)/nvar (dividing Hs by this factor will scale it 0-1).

Next the RMS(%), described above, is printed. Measures closely related to RMS(%) are BR and BRds. BR is the Brainerd-Robinson Coefficient of similarity calculated between the row percents and the marginal row percents (unlike the other measures, larger values of BR (scaled 0-200) indicate greater similarity between the individual case and the marginal distribution (e.g., the combined site). BRds is the Brainerd-Robinson coefficient converted to distance and scaled from 0 (same as marginal percents) to 1 (maximally different from marginal row percents). Next the RMS is recalculated using arcsin transformations applied to the actual and marginal row percents (expressed as proportions). This transformation has the effect of enhancing variation near 0 and 100%. Finally, the RMS based on Z scores, also described above, is reported.

In addition, the program supplies a goodness of fit test, William's corrected G (like Chi²), between the observed counts and expected counts calculated from the marginal row percents. The result of the goodness-of-fit test is the probability of drawing by chance a sample whose counts on the variables differ from those expected from the marginal proportions as much or more than do the observed counts, as measured y the G statistic. That is, the marginal proportions are taken to define a population against which the case is examined. While this is not a direct evaluation of the RMS(%) or BR/BRds values, it is closely related to them. However, it should be noted that the relative values are of interest as well as the significance of any individual value.

The Brainerd Robinson values can be evaluated directly with the BRSample program. This program provides the probability of obtaining a given BR coefficient or a lower one when a sample of a given size is drawn with replacement from a population. For these purposes, when running the BRSample program, population-sample comparisons should be made. The relevant population is the marginal row counts (last line of the printed Input Table matrix). The relevant sample size is printed in the input table as the row sum and appears in the same line as the BR value in the final output table. After running BRSample and getting the table of probabilities, one finds the observed BR value (or the closes whole BR value) in the table. The second number following the BR coefficient printed is the probability of getting that or a lower (less similar) value by chance. For example, for the second case of the example printed here, that probability is .316. This says that we would get a BR value of 144 or less by chance about 32% of the time when choosing a random sample of 16 from the marginal distribution. In this case, we might decide that we are not confident that this unit differs substantially from the site-wide total.

H, Hs, and RMS(Z) are measures that are standardized to weight variables equally. RMS(%), BR, BRds, and Arcsin√p are not standardized. It turns out that in the limited experiments I have done that RMS(%) and BRds (and BR) are well correlated and Arcsin√p is correlated, but less well, with the other two. RMS(Z) and Hs show a closer relationship with each other than with RMS(%) but are not highly correlated. H is expressed in proportion² units; Hs, and RMS(%) are expressed in percent/variable, BR in percent, BRsd on 0-1 scale, and RMS(Z) is in Z-score units/variable. Where standardization is not required, RMS(%) and BRds are easy to explain and measures of deviation from the margin (expected). If you need to standardize, I think Hs or RMS(Z) is likely to be easier to deal with than H as it was originally formulated by Boone.

SEQUENCE OF PROGRAM PROMPTS

Input File Name (CON for Keyboard) {.ADF} ?

Reading xx Units and xx Variables

The program reads an input file of counts Antana format. Rows represent cases, columns are mutually exclusive categories. Program limits are 200 rows and 25 columns.

Read Row Label File {N} ?

Row Label File Name {BOONE.ARL} ?

Read Column Label File {N} ?

Column Label File Name {BOONE.ACL} ?

Program Output (CON for Screen, PRN, or <filename>) {BOONE.LST} ?

The program prompts for labeling information in row and column label files.

Create Output File For Analysis {N} ?

It may be useful to plot the various measures (perhaps logged) against sample size as Boone suggests (the SCAT program can be used for this).

[A]nalyze Another Table or [Q]uit {Q} ?

Repeat the analysis?

INPUT DATA FILE

```17 8
#Counts from Kintigh's Hinkson Site Excavations#
#red white  gray brown sflak lflak sbone lbone#
106   103   198    24    73    16   129     0
4     2     6     1     0     3     0     0
...
122   119    61    12    19    38    58     0```

OUTPUT FILE

```Input Table
COLUMN...
ROW     Red   White    Gray   Brown  Sflake  LFlake   Sbone   Lbone     ===
H12:M01     106     103     198      24      73      16     129       0     649
H13:GK1       4       2       6       1       0       3       0       0      16
...
H17:R01     122     119      61      12      19      38      58       0     429
==>    4193    2384    4503    1037    1974    1412    3077     167   18747

Row Percents
COLUMN...
ROW     Red   White    Gray   Brown  Sflake  Lflake   Sbone   Lbone     ===
H12:M01    16.3    15.9    30.5     3.7    11.2     2.5    19.9     0.0   100.0
H13:GK1    25.0    12.5    37.5     6.3     0.0    18.8     0.0     0.0   100.0
...
H17:R01    28.4    27.7    14.2     2.8     4.4     8.9    13.5     0.0   100.0
==>    22.4    12.7    24.0     5.5    10.5     7.5    16.4     0.9

Z Scores
COLUMN...
ROW     Red   White    Gray   Brown  Sflake  Lflake   Sbone   Lbone     ===
H12:M01    -2.0     2.6     0.9    -1.1     0.6    -4.9     1.3    -2.6     0.0
H13:GK1     2.0    -0.1     2.9     1.4    -5.8     7.8    -4.1    -2.6     0.0
...
H17:R01     3.5    12.0    -3.6    -2.0    -3.3     0.1    -0.5    -2.6     0.0
==>     0.8     0.1    -0.9     0.7     0.2    -1.0     0.3     0.6

COLUMN...
ROW     Red   White    Gray   Brown  Sflake  Lflake   Sbone   Lbone     ===
H12:M01    11.2    19.1    19.5    10.2    16.4     5.0    18.6     0.0   100.0
H13:GK1    15.4    13.5    21.4    15.5     0.0    34.2     0.0     0.0   100.0
...
H17:R01    18.2    31.3     8.5     7.3     6.0    16.9    11.8     0.0   100.0
==>    12.5    12.5    12.5    12.5    12.5    12.5    12.5    12.5

Boone's H: (1/10000) Σ[r(ij)-(100/nvar)]² for row i
where r(ij) is the row% of the matrix of col%s
Hs: 100√(H/nvar)
RMS(%): square Root of the Mean of the sum of Squared
deviations from expected %: √[(1/nvar)Σ(rij-Ri)²] for row i
where rij is the row% of input counts, Ri is marginal row%
BR: Brainerd Robinson coefficient using row% and marginal row%
BR=200-Σ|rij-Rj| (similarity)
BRds: Brainerd Robinson Distance, Scaled 0-1
BRds=Σ(|rij-Rj|/200)=1-BR(rij,Ri)/200 (scaled 0-1)
note: use BRSample for confidence interval
Prob G>obs: Probability of G>observed for deviation between
row observed and row expected based on marginal row%
(using Williams correction for G)
RMS(Arcsin√p): 100√(Σ(arcsin(√pij)-arcsin(√Pj))/nvar) for row i
where pij and PJ are row%/100 and marginal row%/100 of input counts
RMS(Z): RMS based on deviation of column Z-Score of
row percents from Z-score-transformed marginal percent.

Unit        Boone's  Scaled                      Prob      RMS
Lab. Sample       H      Hs  RMS(%)   BRds   BRds  G>obs Arcsin√p  RMS(Z)
H12:M01    649   0.036    6.74    4.04  172.4  0.138  0.000     7.13   0.590
H13:GK1     16   0.104   11.39    9.33  143.9  0.281  0.145    20.73   1.086
...
H17:R01    429   0.065    9.00    7.20  155.2  0.224  0.000    10.41   1.175```

 Home Top Overview Ordering Documentation

Page Last Updated - 19-Jul-2007