TFQA: Tools for Quantitative Archaeology - Statistical Analysis Software for Archaeology

TFQA: Tools for Quantitative Archaeology - Statistical Analysis Software for Archaeology TFQA

TFQA: Tools for Quantitative Archaeology
kintigh@tfqa.com +1 (505) 395-7979

TFQA Home
TFQA Documentation
TFQA Orders
Kintigh (ASU Directory)

LDEN: Local Density Analysis and Neighborhood Composition

LDEN performs Ian Johnson's Local Density Analysis (Johnson 1984) and additionally creates a file of type proportions within a circular neighborhood of point-provenienced artifacts for a variant of Whallon's Unconstrained Clustering spatial analysis (Whallon 1984). It determines, for each point in a data set, the number of points in each class that is within a fixed distance of the subject point. It then uses that information to calculate the local density coefficients between each pair of types and writes either counts or proportions derived from those counts to a file that can be read directly by the k-means program (available from the author) to perform unconstrained clustering.

The local density coefficient between types A and B is the average number of type B points within circular neighborhoods of type A points divided by the expected number of type B points in an area the size of the circular neighborhood. The expected number of points is simply the total area divided by the number of type B points times the area of the neighborhood. Although it may not be intuitively obvious, this measure is symmetric--the coefficient between A and B points is the same as the one between B and A points. The coefficient of a type with itself is a measure of the degree of clustering of that type.

LDEN is currently limited to classes (types) numbered between 0 and 255. The number of points that the program will analyze depends on the computer memory but is effectively unlimited for most computers. Similarly the time required for the analysis depends mainly on the square of the number of points. An ordinary laptop will handle several thousand points. Execution time tends to increase with the square of the number of points (since all interpoint distances must be computed. This should be manageable on ordinary computers for practical problems.

PROGRAM OPERATION

To start the program, simply type: LDEN<Enter>. The program prompts for all the information that it needs to run. In general, the prompts are self explanatory and after you have read this documentation once you will probably have little need to refer back to it. General information the operation of my programs and the data file formats used is provided in the section, "Program Conventions." At any rate, once you have started the program, you will see:

File Containing Coordinate and Type Data {.ADF} ?

Enter the name of the file that contains the information about the points to be analyzed (a drive and full path name can be given here). This file is expected to be in Antana format (described more fully in the "Program Conventions" section). This is a flexible format that is easy to integrate with other data analysis packages. There are three essential things that must be kept in mind about this format: (1) the first two numbers in the file must indicate the number of rows (points) and columns (variables) in the data set; (2) all values must be separated by one or more blanks, commas, or tabs and can appear on any number of lines; and (3) non-numeric characters are not allowed in the file except as comments, which are defined as all characters between # characters on the same line or all characters from an unpaired # to the end of a line (see the example below).

The data file can have any number of rows (points) and must have 2, 3, or 4 columns. The first two columns represent the two-dimensional coordinates that specify the location (the order is unimportant, first east then north or vice versa, as long as it is consistent across all points). If provided, the third column of a data set must be a point class (e.g. tool type or site type), represented by a number between 0 and 255. If none is provided, all points are assumed to be of the same class. If a 4th column is included, it is interpreted as a weight variable. In this case, the analysis works as, for each line, there were N points of identical location and class entered, where N is he value of the 4th column on each line. If you need to rearrange or add columns use ADFUTIL.

To use local density analysis on grid count data, the input data set would include for each non-empty grid cell, the coordinates of the cell, the item class, and the count of that class in the grid cell. Empty cells, and zero-count classes must be omitted. This can potentially allow local density analysis on data sets for which point provenience is not available, or where the number of points is too large to run the analysis. However, when using grid count data, results for any radius smaller than the grid interval would be meaningless.

Listing File {LDEN.LST} ?

File in which the printed local density coefficients should be saved. To send this directly to your printer make sure that it is turned on and answer PRN. To have the output on the screen answer CON. Otherwise give a file name.

Perform Local Density Computations {Y} ?

This prompt is trying to find out whether you want a local density analysis or just the neighborhood composition data set.

Total Area ?

If you ask for the local density analysis, the program needs to know the total area that has been sampled. This number should be in the same units in which the coordinates are given, e.g. meters and square meters.

Enter Neighborhood Radius of List of Radii in Increasing Order

Neighborhood Radius {0.0} ?

Enter the radius of the circle around each point within which point type counts should be accumulated. The radius must be entered in the same units in which the area was measured. While only one radius may be entered on a single line, the program allows you to enter any number of radii. Separate local density computations are done for each radius entered.

Local Density Output File: [N]one, [P]lot, [S]quare, [L]ower, [U]pper {P} ?

If a local density analysis is requested, the program will always produce a printed listing. If additional analysis of these results is desired (e.g. by a multidimensional scaling), you should output the matrix. Note that most MDS routines request a triangular matrix, but some may accept a square matrix. Antana expects an upper triangular matrix while other MDS routines may want a lower triangular matrix.

Perform Log(x+1) Transform on Output File {Y} ?

As Johnson notes, the measure is strongly skewed and can be corrected by the application of a log transformation. The printed output file will always provide the raw local density coefficient, but the output file may be transformed by adding one and taking the base 10 log of the result. This gives the measure a minimum value of 0, the original value of 1 (no association) is transformed to .3. This option is recommended if the output file is to be subjected to further analysis, such as by MDS.

Write [S]imilarity or [D]istance Matrix {S} ?

By default the program will write out a matrix of the coefficients as they are printed. By its definition the local density coefficient is a similarity (proximity) measure, i.e. high values indicate strong associations. As some MDS routines, including Antana, expect a distance measure, the local density coefficient can be transformed to a distance measure by C=1-C/Cmax where Cmax is the largest coefficient in the data set. The distance transformation can be applied to either a raw or log-transformed file.

Output File for Local Density Matrix {LDEN.LDA} ?

Enter the name of the file in which the local density matrix is to be written.

Point Neighborhood Output: [C]ounts, [P]ercents, or [N]either {N} ?

Indicate whether the counts or the percentages of the types are to be output or whether no neighborhood composition file is to be written.

Output Total in Addition to Counts or Percents {N} ?

Then, indicate whether a total count of points per neighborhood should be output along with the counts or percentages. If the output file from this program is to be used by a program in which the total should not be analyzed, answer N to this option. (Enter N for unconstrained clustering using k-means, or eliminate the total column before analysis.)

Minimum Count to Output Neighborhood {1} ?

Finally, enter the minimum count within in circle (neighborhood) that there must be in order to output a density for the point. All points include themselves within the circles so the minimum possible is 1. Any larger number may eliminate cases from the data set. Particularly if percentages are calculated a higher minimum (e.g. 5) may be desirable.

Scanning Data
Reading Data File
Data Read for xxxx Points
Input Phase Time xxxx Minutes
Computing Interpoint Distances
xxx

Distance Computation Time xxxx Minutes

The program indicates that it is reading the data file and reports the number of points read and the time required. As each point is processed to find the points in its neighborhood, its sequential number (1, 2, 3...) is listed. Don't be discouraged if, at first, it looks like it is going to be slow, the points that are processed first take much longer than those processed later.

Output File for Point Neighborhood Composition {LDEN.OUT} ?

Name of the output data set. This data set is ready to be read into k-means (as long as no totals are listed). Each line of the output data set first has a comment (enclosed in #'s) with the sequential input number of a point and its type number. Following this are the x and y coordinates of the point, and the count or percent of each non-zero type in the data set. The header lists the type numbers.) Finally the optional total will be listed.

Writing Point Neighborhood Composition Array
Neighborhood Output Time 0.01 Minutes

These lines simply communicate program progress.

Performing Local Density Computations
Local Density Analysis Time xxxx Minutes
Program End

These messages are issued to keep you informed of the program's progress.

INPUT FILE: LDEN.ADF

16 #points# 3 #variables: x y type# 
 1   1   0  
 1   2   0  
 2   1   0  
 3   3   0  
 8   2   2  
 9   1   0  
 9   3   2  
10   2   0  
 6  10  10  
 6  11  11  
 5  12  12  
 7  12  13  
12   8   4  
13   7   2  
14   9   2  
15   7   4

LISTING FILE: .LST

Local Density Coefficient
  Points 1422    Area  412.00    Radius 2.00  

 Type |     0     2     4    10    11    12    13 
    0 |  5.03  1.89  0.00  0.00  0.00  0.00  0.00 
    2 |  1.89  2.83  2.83  0.00  0.00  0.00  0.00 
    4 |  0.00  2.83  0.00  0.00  0.00  0.00  0.00 
   10 |  0.00  0.00  0.00  0.00 22.64  0.00  0.00 
   11 |  0.00  0.00  0.00 22.64  0.00 22.64 22.64 
   12 |  0.00  0.00  0.00  0.00 22.64  0.00  0.00 
   13 |  0.00  0.00  0.00  0.00 22.64  0.00  0.00 
Type |     0     2     4    10    11    12    13

LOCAL DENSITY OUTPUT FILE: .LDA

  1.89 
  0.00  2.83 
  0.00  0.00  0.00 
  0.00  0.00  0.00 22.64 
  0.00  0.00  0.00  0.00 22.64 
  0.00  0.00  0.00  0.00 22.64  0.00

NEIGHBORHOOD OUTPUT FILE: LDEN.OUT

#Rows# 16 #Cols# 10 #From LDEN.ADF #  # Neighborhood Radius   1.50 # 
#  Types: 0 2 4 10 11 12 13 # 
#    1  0#     1.00    1.00 3 0 0 0 0 0 0 3 
#    2  0#     1.00    2.00 3 0 0 0 0 0 0 3 
#    3  0#     2.00    1.00 3 0 0 0 0 0 0 3 
#    4  0#     3.00    3.00 1 0 0 0 0 0 0 1 
#    5  2#     8.00    2.00 1 2 0 0 0 0 0 3 
#    6  0#     9.00    1.00 2 1 0 0 0 0 0 3 
#    7  2#     9.00    3.00 1 2 0 0 0 0 0 3 
#    8  0#    10.00    2.00 2 1 0 0 0 0 0 3 
#    9 10#     6.00   10.00 0 0 0 1 1 0 0 2 
#   10 11#     6.00   11.00 0 0 0 1 1 1 1 4 
#   11 12#     5.00   12.00 0 0 0 0 1 1 0 2 
#   12 13#     7.00   12.00 0 0 0 0 1 0 1 2 
#   13  4#    12.00    8.00 0 1 1 0 0 0 0 2 
#   14  2#    13.00    7.00 0 1 1 0 0 0 0 2 
#   15  2#    14.00    9.00 0 1 0 0 0 0 0 1 
#   16  4#    15.00    7.00 0 0 1 0 0 0 0 1

Page Last Updated: 17 August 2020