TFQA: Tools for Quantitative Archaeology |
TFQA Home |
NEIG: Nearest Neighbor & Gravity AnalysisNEIG performs nearest neighbor and gravity model analysis. The program will, for each point list the nearest 5 (or fewer) neighboring points with their distances (attractions). If desired, this same information will be written to an output file. The program allows each point to be identified by class (e.g. tool type or site type) and if desired these classes can be grouped. Nearest neighbors can then be found either within or between groups. For each separate analysis, coefficients are calculated and a count is provided of the number of times a point of each class is an nth nearest neighbor of each other class. Finally, with the addition of another variable (e.g. period or level), the data may be further stratified within a single run. Basically, nearest neighbor analysis involves locating closest neighboring point to each point in a data set (the nearest-neighbor distance). Based on the size of the area and the number of points, it is possible to calculate the average nearest neighbor that would be expected if the points were randomly located. By dividing the actual average nearest neighbor distance by the expected average distance based on a random model one can see whether the distribution is more clustered than would be expected at random, is similar to a random distribution, or is more uniformly distributed than a random distribution. The nearest neighbor calculations performed are described by Whallon in his 1974 American Antiquity article. The program does not perform any of the corrections that have been suggested in the literature, however, sufficient raw and aggregated data are provided that these can be easily calculated. Gravity analysis is often used to model interaction, such as exchange or migration. The premise of gravity analysis is that the degree of interaction between any two points works, like physical gravity, increases with the product of size (mass) of the two points and decreases as a function of distance (often distance squared). The first gravity neighbor of any point is the point that has the strongest attraction to that point. The output of the gravity analysis is simply a list, for each point, of the nearest gravity neighbors and their attractions to the point. Often, these data can then be used with additional variables to assess a degree of fit or may be used to make predictions. There are no summary statistics comparable to the nearest neighbor coefficient. NEIG is currently limited to classes numbered between 0 and 30 (but not necessarily all numbers in between), to analysis of the 5 nearest neighbors, and to strata numbered between 1 and 99. The first 12 characters of any point label can be printed. The number of points that the program will analyze depends on the computer memory and on depends on the number of neighbors that one wants analyzed. Similarly the time required for the analysis depends mainly on the square of the number of points in each stratum and on the number of neighbors analyzed. The latter limits are described in more detail in another section, however, a standard PC will handle 3 nearest neighbors for approximately 1400 points and a single nearest neighbor for nearly 2000 points. PROGRAM OPERATION To start the program, simply type: NEIG<Enter>. The program will prompt you for all the information that it needs to run. In general, the prompts are self explanatory and after you have read this documentation once you will probably have little need to refer back to it. General information the operation of my programs and the data file formats used is provided in the section, "Program Conventions." At any rate, once you have typed NEIG, you will see: File Containing Data {.ADF} ? Enter the name of the file that contains the information about the points to be analyzed (a drive or full path name can be given here). This file is expected to be in Antana format (described in the "Program Conventions" section). The data file can have any number of rows (points) and 2, 3, or 4 columns. Let's take the simplest case, with two columns, first. In this case, the two columns represent the two-dimensional coordinates that specify the location (the order is unimportant, first east then north or vice versa, as long as it is consistent across all points). For each point, the closest neighboring points within the data set are identified, and the nearest neighbor coefficient for the entire group of points is calculated. (A gravity analysis of a two column data set cannot be performed because a point size is needed in addition to the coordinates.) In a nearest neighbor analysis, the third column of a data set with three columns must be a point class, represented by a digit between 0 and 30, and the first and second columns must be the coordinates. Use of the point class, such as tool type or site type, is discussed in more detail below. In a gravity analysis, the third column of a three column data set must be the point size. The gravity neighbors are independent of the units used to measure the point size; while the unit used will affect the absolute magnitudes, it does not affect the relative order of the attractions. Thus, you can have coordinates specified in kilometers and sizes specified in population or acres. For both nearest neighbor and gravity analysis, four column data sets can be used. In the four column case, the three columns described in the preceding paragraph can be followed by a stratum number between 1 and 99. This stratum number works like a SAS or SYSTAT BY variable; the requested analysis is run independently for each subset of the data that shares a single stratum number. Typically the stratum number might be a period designation in a settlement study, or might identify a stratigraphic level from a living floor. This stratification does not add to the analytical capability of the program; however, it allows one to perform multiple analyses (e.g. for sites of each period) with a single run of the program. Stratification has no time penalty, in fact since it saves you running the program several times, it be somewhat faster. However, if you run out of memory, or the number of points you wish to analyze in one stratum approaches the limit, you should do separate runs for each subset. Read a Point Label File {N} ? If you are interested in the identities of the nearest neighbors or gravity neighbors of each specific point you will probably want to identify each point with a label, such as a site number or catalog number. If this is the case, answer Y to the first prompt and you will receive the second prompt. To the second prompt, reply with the name of the file that contains labels for each point. This row label file should consist of as many lines as there are points. The nth line is read as a label for the nth point in the input file. The program uses only the first twelve characters of the label file as a point label, although any character, including blanks and special characters can be used. If you do not ask to have the label file read, the program generates sequential numeric labels for the points (e.g. 1, 2, 3...). [N]earest Neighbor Analysis or [G]ravity Model {N} ? Typing N or <Enter> directs the program to do a nearest-neighbor analysis; G indicates that a gravity model is to be used. Area ? If you are performing a nearest neighbor analysis, the program needs to know the size of the area (e.g. the survey area or the excavation unit) within which the points you are analyzing were found. This area needs to be expressed in the square units corresponding to the units in which the point coordinates are given in the data. Thus, if you are analyzing settlement patterns and the site locations are given as UTM coordinates (i.e. meters), then the area must be given in square meters (1,000,000 square meters is 1 square kilometer). Note: If you get a nearest neighbor coefficient that impossibly large (e.g. 30), you have probably entered the area in the wrong units. Distance Exponent in Gravity Model {2} ? Of you are performing a gravity analysis instead, with this prompt you specify the power to which the distance is raised in the attraction equation. Because the attraction is the product of the sizes of two points divided by the distance between them raised to this power, entering a larger power has the effect of making attraction fall off more quickly with distance, and conversely. The default power is 2, yielding the square of the distance. Listing File {filein.LST} ? Give the name of the file or device where you want the printed form of output. A reply of CON will display the results on the screen (but not save them anywhere), a reply of PRN will send the output to the printer. Normally you will want to specify a file (or path name), keeping in mind that the disk on which the file is to be placed must have sufficient room for the file. The default file extension is .LST. Data Read for ???? Points At this point the program reads the data and reports the number of cases read. The time that this requires depends on the number of points. A file with 100 points will be read in just a few seconds, one with 2000 points may take a few minutes. Because the program first reads the Antana header, it knows the number of columns of data and the number of rows to expect. If you have requested a gravity analysis or a nearest neighbor analysis with only two columns in the data set, the program will not issue the next two sets of prompts, so you may skip to the "Listing File" prompt, below. If you have requested a nearest neighbor analysis and have given a point class (3 or 4 column input), the program will continue with the following: Analysis: [A]ll Points, by Point [T]ype, by [G]roup of Types {A} ? The nearest-neighbor analysis can be done in several ways. All points input can be analyzed together; this is the classic sort of analysis. Alternately, each point type can be analyzed separately. Finally, types can be grouped into super types, where for analytical purposes, all members of a super-type or group are considered the same. Specify Label Symbols for Types Here point types are assigned symbols for labeling purposes. (This has no analytical significance.) The label character can be any symbol you wish. For example, if one had site types 1-5, small and large hamlets, small and large villages, and centers, one might make the label character for site type 1 a "h", for 2 a "H", for 3 a "v", for 4 a "V", and for 5 a "C". This is only for convenience. The default label character, obtained by pressing <Enter> is the last digit of the type number or a letter, depending on the number of types. Specify Grouping of Point Types If types are to be grouped, through these prompts the groups are assigned and labeled. The label character may be any character that you would like to have printed for that type in the output. Not surprisingly, all types assigned to a single group number will be grouped together. If all types were assigned to group number 1, the effect would be as if an analysis of all points were requested. For some analytical purposes, it is useful to define groups as consisting of one or more point types. This can achieve two different purposes. First, it can provide a sort of secondary stratification of the data, so the points associated with the types that compose each group are analyzed independently. Following the example described above, assume that group 1 is all hamlets (h and H), group 2 is all villages (v and V), and group 3 is the centers (C). In the normal mode, a nearest neighbor analysis would be done for all hamlets (ignoring villages and centers), for villages (ignoring hamlets and centers), and so forth, and separate coefficients would be calculated for each group. (Similarly you might want to separately analyze scrapers, blades, and debitage.) The second purpose to which the grouping can be put is defined with the following prompt. [W]ithin, [B]etween, or [O]utside Group Neighbors {W} ? If you press <Enter> or W, the analysis of groups will proceed as just described. However, if you press O or B, then the analysis proceeds quite differently. In the case of outside group or type neighbors, throughout the analysis every nearest neighbor of a point in one group or type is constrained to be a member of a different group or type. Thus, you would be asking, what is the nearest neighbor of each hamlet that is not itself a hamlet; what is the nearest neighbor of a village that is not itself a village. Like the within group neighbors, the analysis (including computation of the coefficient) is done separately for each group. Number of Neighbors Printed {2} ? The number of specific neighbors that you want printed for each point. A reply of 0 will suppress the lengthy listing of the neighbors for each point, but will still print the aggregate information. Two neighbors will always fit on a standard 80 column line. If sequential labels or labels 8 characters or shorter are provided, three neighbors will fit on an 80 character line. Up to 5 neighbors can be requested (let me know if you really need more), and printed on a printer with a wide carriage or in compressed mode. (For anyone who cares, the line length with NN nearest neighbors printed is (NN+1)*(labellen+13)-4, where the label length is adjusted to a minimum length of 7.) Number of Decimals Printed in Distances {0} ? Indicate the number of digits that should appear to the right of the decimal in the distances (attractions) as they are listed. All distances printed including the decimal point and any digits to the right of the decimal and must fit in an 8 character field. Usually, the reply to this prompt will be the number of decimals in the input. Write Neighbors to Output File {N} ? If you want to do further analysis of the neighbors, this option allows you to save the results in a form that can be easily subjected to additional analysis. The same number of neighbors are output here as are listed (see above), with the exception that if 0 neighbors are listed, one neighbor will be output. Each line of the output contains the stratum, label, and type for the point, plus the label, type, and distance for each of the neighbors. Number of Random Runs {0} ? A Monte Carlo Analysis, described in Kintigh 1990 may be performed. Enter the number of random runs. This option is not available for between-group neighbors. Computing Interpoint Distances These messages are issued to keep you informed of the program's progress. As each point is processed to find its nearest neighbors, its sequential number (1, 2, 3...) is listed. Don't be discouraged if, at first, it looks like it is going to be slow, the points that are processed first take much longer than those processed later. PROGRAM TIME ESTIMATES The program gives a running log of the sequential point number that it is working on (note that the farther along the analysis gets, the less time it takes to process each new point). Basically the time required by nearest neighbor analysis increases with the square of the number of points involved. INPUT FILE: .ADF 16 #points# 3 #variables: x y type {area=154} # 1 1 0 1 2 0 2 1 0 3 3 0 8 2 2 9 1 0 9 3 2 10 2 0 6 10 10 6 11 11 5 12 12 7 12 13 12 8 4 13 7 2 14 9 2 15 7 4 INTERACTIVE SESSION C> NEIG File Containing Data {.ADF} ? neig Read a Point Label File {N} ? N [N]earest Neighbor Analysis or [G]ravity Model {N} ? N Area ? 180 Listing File {NEIG.LST} ? Reading Data File Data Read for 16 Points Analysis: [A]ll Points, by Point Type, by [G]roup of Types {A} ? A Specify Label Symbols for Types Type 0 Label Character {0} ? Type 2 Label Character {2} ? ... Type 13 Label Character {A} ? Within-Group Neighbors will be Computed Number of Neighbors Printed {3} ? 2 Number of Decimals Printed in Distances {0} ? 1 Write Neighbors to Output File {N} ? Y Output Data File {NEIG.OUT} ? Computing Interpoint Distances 16 Computing Summary Statistics Computation Time 0.0 Minutes Random Run Elapsed Time 0.0 Minutes Program End OPTIONAL OUTPUT FILE: .OUT #Rows# 16 #Cols# 6 #From NEIG.ADF # 1 # 1# 0 # 2# 0 1.0 # 3# 0 1.0 1 # 2# 0 # 1# 0 1.0 # 3# 0 1.4 1 # 3# 0 # 1# 0 1.0 # 2# 0 1.4 1 # 4# 0 # 2# 0 2.2 # 3# 0 2.2 1 # 5# 2 # 6# 0 1.4 # 7# 2 1.4 1 # 6# 0 # 5# 2 1.4 # 8# 0 1.4 1 # 7# 2 # 5# 2 1.4 # 8# 0 1.4 1 # 8# 0 # 6# 0 1.4 # 7# 2 1.4 1 # 9# 10 # 10# 11 1.0 # 11# 12 2.2 1 # 10# 11 # 9# 10 1.0 # 11# 12 1.4 1 # 11# 12 # 10# 11 1.4 # 12# 13 2.0 1 # 12# 13 # 10# 11 1.4 # 11# 12 2.0 1 # 13# 4 # 14# 2 1.4 # 15# 2 2.2 1 # 14# 2 # 13# 4 1.4 # 16# 4 2.0 1 # 15# 2 # 13# 4 2.2 # 14# 2 2.2 1 # 16# 4 # 14# 2 2.0 # 15# 2 2.2 LISTING FILE: .LST +-----------------------------------------------------+ | Stratum 1 || Neighbor 1 | Neighbor 2 | | Point Type || Point Type Dist | Point Type Dist | +-----------------------------------------------------+ | 1 @ || 2 @ 1.0 | 3 @ 1.0 | | 2 @ || 1 @ 1.0 | 3 @ 1.4 | | 3 @ || 1 @ 1.0 | 2 @ 1.4 | | 4 @ || 2 @ 2.2 | 3 @ 2.2 | | 5 B || 6 @ 1.4 | 7 B 1.4 | | 6 @ || 5 B 1.4 | 8 @ 1.4 | | 7 B || 5 B 1.4 | 8 @ 1.4 | | 8 @ || 6 @ 1.4 | 7 B 1.4 | | 9 J || 10 K 1.0 | 11 L 2.2 | | 10 K || 9 J 1.0 | 11 L 1.4 | +-----------------------------------------------------+ | 11 L || 10 K 1.4 | 12 M 2.0 | | 12 M || 10 K 1.4 | 11 L 2.0 | | 13 D || 14 B 1.4 | 15 B 2.2 | | 14 B || 13 D 1.4 | 16 D 2.0 | | 15 B || 13 D 2.2 | 14 B 2.2 | | 16 D || 14 B 2.0 | 15 B 2.2 | +-----------------------------------------------------+ NN Computations: Stratum= 1 Within Group 1 <@BDJKLM> NN Coeff N robs rexp Area Density Std(ro) StdErr Test C Prob 0.89 16 1.42 1.60 154.00 0.0974 0.40 0.21 -0.85 Point| Neighbor 1 Type Type | @ B D J K L M -----+----------------------------- @ | 5 1 0 0 0 0 0 B | 1 1 2 0 0 0 0 D | 0 2 0 0 0 0 0 J | 0 0 0 0 1 0 0 K | 0 0 0 1 0 0 0 L | 0 0 0 0 1 0 0 M | 0 0 0 0 1 0 0 Point| Neighbor 2 Type Type | @ B D J K L M -----+----------------------------- @ | 5 1 0 0 0 0 0 B | 1 2 1 0 0 0 0 D | 0 2 0 0 0 0 0 J | 0 0 0 0 0 1 0 K | 0 0 0 0 0 1 0 L | 0 0 0 0 0 0 1 M | 0 0 0 0 0 1 0 Page Last Updated: 1 April 2020 |
Home | Top | Overview | Ordering | Documentation |