TFQA: Tools for Quantitative Archaeology - Statistical Analysis Software for Archaeology TFQA
TFQA Logo

TFQA: Tools for Quantitative Archaeology
     kintigh@tfqa.com   +1 (505) 395-7979

TFQA Home
TFQA Documentation
TFQA Orders
Kintigh (ASU Directory)


PhaseLen: Estimate True Intervals from Dates with Normally Distributed Errors

Provides a Monte Carlo analysis to estimate the span of true span producing an observed set of measured dates with Gaussian errors such as radiocarbon and obsidian hydration dates. The program has an opton for calibration.  In test mode, the program can be used to help decide how many dates are likely to be needed to obtain a good estimate.  The program comes with current radiocarbon calibration files.

Phaselen is a Windows program written in Delphi, an extension of Pascal as implemented in the Embarcadero RAD Studio XE. The program logic can be seen in the program source file PhaselenXX.dpr (where XX is the current version; about 1400 lines of code not including the supporting procedure code). Supporting procedures not essential to the program logic are in KWKSTD.PAS and are compiled in KWKSTD.DCU.

Running the Program

Download the PhaseLen.exe along with the calibration files (in the "TFQA\C14Calibration" folder in the unzipped distribution directory) to a directory on your computer. (The calibration files, if they are to be used must be in the directory you are using, not in a folder in the directory you are using.) To run the program, navigate (e.g., using the Windows File Explorer) to your directory and double-click PhaseLen.exe. In a new Windows “Run” window, you will see the program banner. After that, the program will prompt you for information that it needs to run. Default answers are provided in {curly braces} and can be obtained by just pressing <Enter>. Reply Y or N to yes or no questions. More general information on running my programs may be found at http://tfqa.com. See especially, Program Conventions andRunning TFQA under Windows.  For use with calibrated intervals the needed calibration file(s) (UWTEN93.14C, INTCAL13.14C, MARINE93.14C) must be copied to directory from which the program is run.

The program can be run in three modes, Generate, Evaluate, and Test. The sequence of commands for each mode is described separately. The Generate mode is used to generate date samples where dates are drawn from a specified true interval. Evaluate mode is used to produce estimates of interval lengths for an empirical set of dates. Test mode is used to test the methods, where measured samples with known characteristics are generated and then the estimation procedures are applied so one can see how well they perform for the specified number of dates, true interval length, and date standard error.

Generate Mode

In the generate mode, the program takes a specified true interval and date standard deviation and creates a file of dates from that true interval that include the specified Gaussian (normally distributed) errors. Any number of dates may be requested.

[E]valuate Dates, [G]enerate Dates, [T]est Mode {E} ? G

Type G<enter> to select Generate mode.

Random Generator Seed (0 to set from clock) {0} ? 

Press [Enter] to set the random number generator seed from the clock. Ordinarily setting the random number generator from the clock is fine. Setting a specific number is useful only if you want to reproduce a run precisely.

Generate Dates for a [C]alibrated Radiocarbon or [U]ncalibrated Interval {C} ?

Reply C (or <enter>) to generate dates in radiocarbon years from a trueb calendar year interval. Enter U for an uncalibrated interval e.g., for obsidian hydration dates or experimentation).

Calibration File: [1]IntCal13, [2]IntCal93,[3]UWTen93, [M]arine93 {1} ? 
Calibration File {INTCAL13.14C} ?

For a calibrated interval choose the appropriate calibration file. IntCal13.14c is the default and should be used unless you have a good reason to do otherwise. The second prompt confirms the file name. This file must be in the directory from which the program was run.

True Interval Start Date ?
True Interval End Date ?

Starting and ending dates for the true interval. Entering 1000 and 1200, respectively would use the 200 year interval between AD 1000 and 2000. If calibration is to be done the true date interval is specified in AD+/BC-calendar year dates, not radiocarbon year dates.

Model Distribution of True Dates Across the Interval
Model: [R]ectangular or Truncated [N]ormal {R} ? R

The program provides two ways to select true dates for the output sample from the specified true interval (before the Normally distributed standard errors are added). Ordinarily, one would use the Rectangular model in which each year within the true interval is equally likely to be selected. (Note this is distinct from a uniform spread of true dates [which is not implemented] in which the true dates are equally spaced across the true interval.)

However, if date-producing activity is expected to start

gradually, peak, and then decline across the interval the truncated normal distribution would be most appropriate. In this model the true dates are selected from a normal distribution with the ends of the true interval set at the standard deviation cutoff. The distribution is truncated in the sense that any date outside the standard deviation cutoff is ignored.

This is easiest to explain by example. First, assume the Std. Cutoff is 2.0. A random true date is selected by first selecting a normally distributed random number with a mean 0 and standard deviation of 1. If the random number is less than -2 or more than 2 (the cutoff), it is ignored and another is selected. If the random number is within the range (-2,2), then the true date is calculated by assuming that the -2 to +2 range corresponds to the true interval. Thus, if the true interval is 1000-1200, and the normally distributed random number selected is +1.1 (1.1 standard deviations above the mean), then the true date is 1100+100*1.1/2.0: that is, the midpoint of the true interval in years plus the product of (half the true interval width in years) and the (normally distributed random number) divided by (the absolute value of the std. cutoff range)

The larger the standard deviation cutoff, the greater the weight of the center on the distribution. Thus, with a cutoff of .5, the middle 50% of the interval contains 52% of the probability, with a cutoff of 1 it contains 56%, for 2, 72%, for 3, 86%, for 4, 96%. With large cutoffs (>3), extreme random numbers are relatively unlikely, so dates near the ends of the interval will be unlikely. It seems unlikely that a cutoff greater than 3 would be useful in the real world.

Number of Dates/Sample ? 

The number of dates that you would like to generate as a simulated sample.

Standard Error ?

The standard error of the dates generated. The simulated date is generated by picking a true date at random from within the interval, as described above, and then picking a random number with a mean of 0 and a standard error specified here, and adding that number (which may be negative) to the simulated true date. (Note: the evaluation procedure creates a population for the Kolmogorov-Smirnov goodness of fit measure. It uses this same procedure for creating the population. Thus, if you’d like to see the shape of the distribution that is the KS population, you can generate a large number of dates (e.g., 10,000) using the Generate option.)

Produce [S]tatistics for Date Sets or [G]enerate 1 Set of Dates {G} ?

You select here what you would like the program to produce. Press G or enter to generate one or more samples of dates and write them to separate files. Pressing S asks the program to generate multiple samples of dates. The sample dates themselves are not saved, but the date sample’s statistics are written to a CSV file (e.g. earliest and latest measure dates, interquartile range, Ihat etc.) A more flexible option is possible in Test mode with no Monte Carlo analyses requested).

File for Output Dates {.ADF} ? 

If you selected G at the previous prompt, this prompt requests the Name of the file for the output. In the output the set of dates is preceded by a single line listing the number of rows (dates) and variables (2, the date and the standard deviation [sigma]). A comment (preceded by a #) provides the program parameters that generated the dates. With the G option from the previous prompt, the program will display (but not save) statistics regarding the sample selected.

File for Output Dates {.CSV} ? 

If you selected S at the previous prompt, this prompt requests the Name of the file for the output. In this output the statistics for each sample generated are written to a file in CSV format.

#Dates  Earliest  Latest   Span    IQR   Mean  Median  Ihat    K
    40       793    1125    332     80    935     926   162 3.46 Uncalibrated
    40      1002    1198    196    106   1100    1096            BC/AD

This is a report of the program progress, showing the number of dates generated and the mean and the range of each sample. The first line of data shows 40 simulated radiocarbon dates selected from the true interval 1000 to 1200. The second line of data describes the true dates of the simulated sample (with no measurement error). Here, the 40 dates from a true interval of 1000-1200 AD happened to have a mean of 1100 (because of the sampling process, the mean usually will be close to, but not at, the middle of the distribution) and span of 196 years. . In this case the samples in radiocarbon years, with standard errors incorporated, have a span of 332 years from 793-1125 BP. Note here that the number of dates in the calendar year line will always be the same as in the uncalibrated line (which for radiocarbon dates are in radiocarbon years) because a reverse calibration is done. There is only ever one mapping from a calendar year to a radiocarbon year (but the reverse is not true) and that is done simply by interpolating from the calibration dataset. In this mode, the program runs very quickly requiring only a few seconds to generate even large samples

Generate Another Set With the Same Parameters {N}?

You can repeat this process, generating another sample with the same parameters and saving it in another file.

Number of Sets of Dates to Generate {1} ? 

If you selected S to produce statistics, you are interested in seeing summary statistics associated with a number of samples of dates. Enter the number of sets to generate.

File for Output Dates {.CSV} ?

If you selected S at the previous prompt, this prompt requests the Name of the file for the output. In this output the statistics for each sample generated are written to a file in CSV format.

#Dates  Earliest  Latest   Span    IQR   Mean  Median  Ihat    K
   100       955    1236    281    128   1095    1102   214 3.46 Uncalibrated
...
100 958 1236 278 110 1097 1100 215 3.46 Uncalibrated 100 961 1237 276 100 1098 1106 194 3.46 Uncalibrated ... True Interval Length=200 Ihat mean= 198.72 std= 15.15 Date Sets=100
Date Sets w/ Ihat Defined=100

The program will display statistics on the date samples for the first 50 sample, but will write all of them to a CSV file. Î (or Ihat) is an estimate the of true interval length calculated, following Cowgill (1998), by Î =K (So2– σm2]1/2)

Program End.

Program Output

In generate mode, the program produces no printed output, only a file of dates and standard deviations. The output file if single generated sample output is requested looks like this:

50, 2 #Generated Dates 1000 to 1200 Model R ( 0.00) Randseed: 271452437
  1152  60
  1067  60
  1191  60
...
  1240  60

In generate mode where date sample statistics are requested, the output file looks like this:

Dates, True_Start, True End, Model, Model_SD, Earliest,Latest, Span, 
IQR, Mean, Median, Ihat, K 100,1000,1200, "R",0.00,955,1236,281,128,1095,1102, 214, 3.46 100,1000,1200, "R",0.00,970,1214,244,101,1100,1097, 183, 3.46 100,1000,1200, "R",0.00,932,1267,335,118,1103,1110, 209, 3.46 … 100,1000,1200, "R",0.00,968,1215,247,86,1097,1106, 173, 3.46

For calibrated radiocarbon data there are more columns:

Dates, True_Start, True End, Model, Model_SD, Earliest,
Latest, Span, IQR, Mean, Median, Ihat, K, Intercepts, Cal_Earliest, 
Cal_Latest,Cal_Span, cal_IQR, Cal_Mean, Cal_Median 100,1000,1200, "N",2.00,812,1077,265,72,934,933,161,4.55,100,1005,1197,192,62,1095,1094 100,1000,1200, "N",2.00,778,1077,299,62,932,932,171,4.55,100,1006,1198,192,68,1100,1100 ... 100,1000,1200, "N", 2.00,817,1063,246,60,934,940,153,4.55,100,1005,1189,184,72,1095,1099

Evaluation Mode

In evaluation mode, the program reads a set of empirical dates and their associated standard deviations and evaluates the true intervals that are most likely to have resulted in this empirical distribution (given the constraints of the evaluation model). The empirical dates may be in positive radiocarbon dates in years BP or could be obsidian hydration dates in years AD/BC with BC dates negative. For prompts that are not described here, see the description of the prompt in the Generate section.

[E]valuate Dates, [G]enerate Dates, [T]est Mode {E} ? E

Press E or <enter> to select evaluation mode.

Random Generator Seed (0 to set from clock) {0} ?

Press [Enter] to set from clock or enter an integer to specify a seed so a previous run can be duplicated.

Derive Monte Carlo-Based Interval Length Estimates {Y} ?

Answer Yes if you want the Monte Carlo-based interval length estimates. If you only want the statistics on the input file (including Ihat), then Answer No.

File with Empirical Dates (Con for Keyboard){.ADF} ? 

Name of the file containing the set of dates and associated standard deviations to be evaluated. The file should be an ASCII (text) format with two numbers for each date, first the date (mean), then the date’s standard deviation. For radiocarbon dates, these will be radiocarbon years BP. For obsidian hydration or other dates that are not calibrated they will be years AD (+) or BC (-). (The program ignored the fact that there is no calendar year 0.) Preceding these data should be a line with the number of dates followed by a comma and/or at least one space and a 2 for the number of variables. (The current program limit is 2500 dates.) The input file for 50 dates could look like this (though formatting is very flexible):

50, 2
  1152   60
  1067   60
  1191   60
...
  1240   60
Estimate [C]alibrated or [U]ncalibrated Intervals {C} ?

Reply C (or <enter>) to evaluate calendar year interval for radiocarbon year input dates. Enter U for an uncalibrated interval (this option is used for obsidian hydration dates or experimentation). When calibrated intervals are selected, the file of empirical dates is assumed to be in (positive) radiocarbon years BP, and the simulated samples are also in radiocarbon years. However, the true interval specification is in calendar dates.

Calibration File: [1]IntCal13, [2]IntCal93, [3]UWTen93, [M]arine93 {1} ?
Calibration File {INTCAL13.14C} ?

For a calibrated interval choose the appropriate calibration file. IntCal13.14c is the default and should be used unless you have a good reason to do otherwise. The second prompt confirms the file name. This file must be in the default directory (the one from which the program was run). In dealing with the calibrated situation, PHASELEN first finds all of the intercepts for the empirical (radiocarbon year) dates. In bumpy areas of the calibration curve this will be more than the original number of dates. It then calculates a median and weighted mean (weighted so that radiocarbon dates with multiple intercepts count the same as those with single intercepts) of those intercepts to use as the midpoint for the calibrated (calendar year) intervals. The program operates from a calendar year interval centered on the mean or median (you pick, below) and selects a set of dates (with the same number of dates as the number of intercepts for the empirical set) at random from that calendar interval and maps those calendar year dates to their radiocarbon year equivalents yielding a set of radiocarbon year dates. (Going in that direction, there is a one-to-one mapping, you don't have the multiple intercept problem). If the curve is bumpy, then the distribution of selected radiocarbon year date equivalents will be bumpy (starting either from a rectangular or truncated normal calendar year distribution), but that is fine because you are testing the generated radiocarbon year set against the empirical dates which should be subject to the same kinds of bumpiness. Using the dmax from the KS test, you are looking for the cumulative curves to be the same shape--but it doesn't matter what that shape is.

Listing File or Device {.TXT} ?

Name for the output listing file.

Model Distribution of True Dates Across the Interval
  Model: [R]ectangular or Truncated [N]ormal {R} ? 

The program finds the interval that best fits the empirical distribution by generating a large number (specified below) of sets of dates with the same number of dates as the input distribution and with errors selected from the input set of Gaussian errors. For most purposes pick R for a rectangular distribution in which any date in the true interval is equally likely to be selected in forming the trial sets of dates. See the discussion under the Generate mode for more information about the truncated normal option.

Number of Trials to Create KS Population {10000} ?

The program does four kinds of evaluation. In the first, dmax from the Kolmogorov-Smirnov statistic is used to evaluate the degree of fit between the empirical distribution and a pseudo-population of dates that could have come from each interval. dmax is the maximum difference in the cumulative percentage curves of the ordered dates for the empirical and population distributions. For each interval tested, a population of dates with Gaussian errors is simulated using the process described above for the Generate mode. This prompt requests the number of dates that should form this population. The larger the number the more closely it will approximate the real population, but the more time it will take.

Number of Trials for Sample Comparisons {50000} ?

For each tested interval width, a large number of simulated sets of dates (with the same number of dates as the input set and standard deviations selected from the standard deviations of the input dates) is compared with the empirical dates. The interval for which the simulated sets most closely matches the empirical dates is then viewed to be the best fit.

Interval Midpoint Set at [1] Mean, [2] Median, [U]ser Selected Midpoint {1} ? 

The intervals tested can be centered on the mean or the median. Press 1 for the mean or 2 for the median. The means is a more robust statistics but in the real world can be influenced by extreme outliers that represent not sampling error but erroneous dates due to contaminated samples, the old wood problem, or whatever.

Use Adjusted [E]uclidean Distance or Mean |[D]eviation| {E} ? 

Select the distance function. Press E for adjusted Euclidean distance between the ordered empirical and ordered generated sample. Adjusted Euclidean distance is the square root of the sum of the squared differences between the ordered sets of dates, divided by the number of dates. Mean deviation is the average of the absolute values of the differences between the ordered sets of dates.

Minimum True Interval Considered {0} ?
Maximum True Interval Considered {300} ? 
Interval Increment (>=2) {25} ?

Specify the range of interval widths (centered on the middle of the empirical distribution) considered. The increment is what is added to the minimum interval at each step as the interval width increases to the maximum. By default the program uses the empirical range, or 300 years in this example. The program will always round up the maximum to the next even interval. In this example, pressing <Enter> for each reply will test the interval 0 (i.e. 1 year), 25, 50, ... 300. Specifying finer intervals will yield a more precise result but take more time.

#Dates  Earliest  Latest   Span    IQR   Mean  Median  Ihat    K
    25       975    1273    298    104   1100    1093   203 3.46 Uncalibrated

          Phase      dmax      Dist     Span ( 298)  IQR (104)
 Phase -----------  ------ ------------ ----------- -----------
Length  From    To  Value   Mean   Std  Mean  %ile  Mean  %ile

     0  1100  1100  0.150   6.53  1.18   197  99.4    64  99.3
    25  1088  1113  0.149   6.47  1.20   199  99.3    64  99.2
    50  1075  1125  0.143   6.27  1.21   204  98.8    66  98.6
    75  1063  1138  0.128   5.98  1.23   214  97.9    70  97.6
   100  1050  1150  0.120   5.62  1.24   226  95.8    74  94.9
   125  1038  1163  0.101   5.27  1.25   240  91.8    80  89.6
   150  1025  1175  0.085   4.99  1.25   255  84.9    87  81.1
   175  1013  1188  0.074   4.84  1.28   272  74.3    95  68.7
   200  1000  1200  0.084   4.89  1.37   289  60.1   103  53.7
   225   988  1213  0.106   5.17  1.51   307  44.3   112  39.1
   250   975  1225  0.128   5.67  1.70   326  28.9   122  26.7
   275   963  1238  0.151   6.38  1.90   345  16.7   132  17.5
   300   950  1250  0.164   7.23  2.07   364   8.6   143  10.9

Compute Time:    0.14 Minutes
Program End

OK to Close Program Window {Y} ?

It provides this prompt to allow you to examine the on-screen results. One you reply with Y or Enter it will close the Windows Run screen window and you won’t be able to recover it.

The program first provides information about the empirical sample. Then, as the program progresses, it reports the result of each set of evaluations. The dmax value provides the value of the maximum proportional difference between the empirical date cumulative distribution and the ideal population cumulative distribution generated for that interval length with the given model. For example, for the 175 year interval, 0.074 is the maximum difference in cumulative proportions between the empirical distribution and the population distribution.

The Distance mean (and std.) reflects the mean (and std.) over all trials of the distance between the empirical and simulated sample using the distance measure chosen. Experimentation shows this is not a good measure to use. The Span provides the mean and std. of the range of uncalibrated dates in the simulated samples for each interval. The mean value can be compared with the empirical range listed next to Span in the heading. Similarly, the IQR or Interquartile Range (between the 25th and 75% percentiles) of the uncalibrated intercepts

Most evaluations can be done in a few seconds. The computation time has two major components: one is directly related to the number of intervals tested times the number of dates times the number of trials for the sample comparisons; the second is directly related to the number of intervals tested times the number of trials to create the KS population.

Program Output

The listing file essentially duplicates the screen output but adds “<” marks to indicate the best fit for the four different measures, KS dmax, distance, and the span, and the Interquartile range of the mean dates in the sample.

File: TABLE2A2.TXT
Random Number Seed:       201450948

#Dates  Earliest  Latest   Span    IQR   Mean  Median  Ihat    K
    25       975    1273    298    104   1100    1093   203 3.46 Uncalibrated

KS Population Size:       100000
Sample Comparison Trials: 100000
Uncalibrated Interval Used
Model Distribution: Rectangular
Using Mean as Center of True Intervals
Distance = Euclidean Distance/Number of Dates

          Phase      dmax      Dist     Span ( 298)  IQR (104)
 Phase -----------  ------ ------------ ----------- -----------
Length  From    To  Value    Mean   Std  Mean  %ile  Mean  %ile
     0  1100  1100  0.150    6.53  1.18   197  99.4    64  99.3
    25  1088  1113  0.149    6.47  1.20   199  99.3    64  99.2
    50  1075  1125  0.143    6.27  1.21   204  98.8    66  98.6
    75  1063  1138  0.128    5.98  1.23   214  97.9    70  97.6
   100  1050  1150  0.120    5.62  1.24   226  95.8    74  94.9
   125  1038  1163  0.101    5.27  1.25   240  91.8    80  89.6
   150  1025  1175  0.085    4.99  1.25   255  84.9    87  81.1
   175  1013  1188  0.074<   4.84< 1.28   272  74.3    95  68.7
   200  1000  1200  0.084    4.89  1.37   289< 60.1   103< 53.7

Test Mode

The test mode is quite similar to the evaluation mode. In the test mode, however, the program generates a simulated “empirical” set of dates from a known true interval and then evaluates those dates using the evaluation procedure discussed above, so that one can see how well the evaluation actually does in estimating true interval that is known (because you specified it). For prompts not explained here see the evaluation mode or generate mode descriptions.

[E]valuate Dates, [G]enerate Dates, [T]est Mode {E} ? T

Press T to select Test Mode.

Random Generator Seed (0 to set from clock) {0} ?
Derive Monte Carlo-Based Interval Length Estimates {Y} ?

Testing mode has two options. Selecting Y at this prompt provides the full output with the test sample statistics including Ihat for each test along with the Monte-Carlo based interval estimates. If you just want to test the accuracy of the Ihat statistic answer No, otherwise answer Yes.

Generate dates from [C]alibrated Radiocarbon or [U]ncalibrated Interval {U} ? 
Calibration File: [1]IntCal13, [2]IntCal93, [3]UWTen93, [M]arine93 {1} ?
Calibration File {INTCAL13.14C} ?
True Interval Start Date ? 
True Interval End Date ? 
Model Distribution of True Dates Across the Interval
  Model: [R]ectangular or Truncated [N]ormal {R} ? 

If Calibration is selected the True date interval is described in calendar years AD/BC with BC negative. The true interval and the model distribution for that interval specified here applies to all tests. However, the numbers of dates and their sigmas can vary as specified below.

Test for [S]ingle or [M]ultiple Sets of Numbers of Dates & SDs {S} ?

If S, for single, is chosen the program generates and evaluates a (below) specified number of simulated “empirical” sets of dates (with a given number of dates and a given standard deviation) and reports on the results. However, the program can also do this procedure for more than one number of dates and more than one standard deviation for the each set of dates, so that one can test the evaluation procedure’s sensitivity to the number of dates and the standard deviation of the dates. Press M to test more than one number of dates or more than one standard deviation (or both).

Starting Number of Dates ?
Ending Number of Dates ? 
>Increment in Number of Dates {5} ? 

Specify the range of the number of dates to test. If one replies, 50, 100, and 25, respectively to the prompts, then the program will do each set of evaluations for 50, 75, and 100 dates. The program will always start with the maximum and end on the next even interval at or smaller than the minimum

Starting Standard Deviation ?   
Ending Deviation ? 
Increment in Standard Deviation {5} ? 

Similarly, specify the range of the standard deviations dates to test. If one replies, 50, 100, and 10, respectively to the prompts, then the program will do each set of evaluations for dates with 50, 60, 70, ... 100 year standard deviations.

Number of Test Runs for Each SD and Number of Dates ? 

Specify the number of times, for each number of dates and for each standard deviation, that the program will generate a simulated “empirical” set of dates and evaluate those dates. While I haven’t done a great deal of experimentation, with modest number of dates (e.g., on the order of 25) it may take more than 100 test runs to get a representative sense of the behavior of the program. With larger numbers of dates, fewer runs will be required, because there will be less variability between the runs.

Listing File or Device {.LST} ? 

Name of the file for the program listing. The listing is in the same format as for the evaluation mode, with summary tabulations added at the end. For long runs this file can become quite large.

Number of Trials to Create KS Population {10000} ?
Number of Trials for Sample Comparisons {1000} ?
Work from [1] Mean or [2] Median {1} ? 
Use Adjusted [Euclidean Distance or Mean |[D]eviation| {E} ?
Minimum True Interval Considered {0} ?
Maximum True Interval Considered {340} ?
Interval Increment (>=2) {50} ?

Answer all these questions as you would for the evaluation mode. Note that the program uses the same interval range and increment for each evaluation.

Output Summary File Type [C]SV or [S]ystat {C} ? 

The program also writes a file of summary results which reports, for each test run. With Monte Carlo estimates produced either a CSV file or a SYSTAT command file. Can be produced. With no montecarlo analyses, only CSV files are offered. If the Systat optionis chosen is used, the program produces a SYSTAT command file that can be SUBMITted to SYSTAT for analysis. Producing box plots of the results for each measure has proved useful. In SYSTAT 5.0, box plots can be creating using this command file as follows:

       DATA
       SUBMIT <command file name without “.SYC”> 
       USE <command file name without “.CMD” >
       BOX MAXDIF*GROUP$ / TRANS  MIN=0
       BOX RANGE*GROUP$ / TRANS  MIN=0
       BOX MID_DATE*GROUP$ / TRANS 

Analysis File for Test Summary {TESTT.CSV} ?

Name of the output file.

NDate/SD Loop  Test:Interval  %Done  HHHH:MM:SS Remaining
    3 of    4     7: 100      58.0%  ÷  0: 0:10

The program then reports its progress. After each interval evaluated it estimates the time remaining. If you want to interrupt the program, no harm will be done, just press <Ctrl>C or <Ctrl><Break>. If only one number of dates is used, the time estimate should be fairly close. If more than one number of dates is to be tested, the estimate will be a worst-case overestimate. If there is a large range in the number of dates, the time will be drastically overestimated.

Note, however, that it is easy to create requests that will take a long time. For the test mode, the time formula provided under the evaluation mode is multiplied by the (number of test runs)*(number of standard deviations tested)*(number of different numbers of dates tested). Lets assume 200 test runs, 5 different sets of numbers of dates (ranging up to 150), 4 different sets of standard deviations, 20,000 and 2000 trials, and 10 intervals for each set. T=200*5*4*10*( c1*20,000+c2*2000*150)=c1*800,000,000+c2*1,200,000,000. Even if c1 and c2 are small, which they are (for the sake of argument let’s say 1 millionths of a second each), then this problem would take on the order of 5.5 hours).

Compute Time:   6.6 minutes
Program Done

When the program finishes, it gives you the total time.

Program Output

The program listing gives the evaluation of each test run

and for each combination of number of dates and standard deviation provides a summary table (so long as fewer than 200 test runs are requested. The loop number is the particular combination of number of dates and their standard deviation.

File: TESTT.TXT
Random Number Seed:       1318037492
KS Population Size:       10000
Sample Comparison Trials: 1000
Calibrated (Calendar Year) Interval Used
Model Distribution: Rectangular
Using Mean as Center of True Intervals
Distance = Euclidean Distance/Number of Dates

============================================================================

Loop: 1  Test: 1  Number of Dates: 100  Std: 75
True Interval: 1000 to 1200  Middle: 1100  Length: 200  Model: R

"Empirical" Dates (ndate=100 std=75): 
   730   736   760   764   782   792   807   808   823   823   839   842   846
   848   849   850   855   866   869   875   877   878   885   893   894   899
   900   901   906   907   907   911   913   916   921   922   923   925   926
   927   927   929   934   936   937   938   940   940   941   942   944   944
   946   947   947   950   950   954   956   957   959   960   960   962   962
   967   968   976   977   977   981   981   984   987   987   988   997   999
  1003  1005  1006  1015  1015  1016  1024  1025  1031  1035  1036  1037  1050
  1051  1054  1081  1086  1087  1104  1116  1120  1120 

#Dates  Earliest  Latest   Span    IQR   Mean  Median  Ihat    K
   100       730    1120    390     91    940     943   128 3.46 Uncalibrated
   100      1006    1198    192    109   1105    1116            BC/AD

          Phase      dmax      Dist     Span ( 390)  IQR ( 91)
 Phase -----------  ------ ------------ ----------- -----------
Length  From    To  Value    Mean   Std  Mean  %ile  Mean  %ile
     0  1105  1105  0.052<   1.76  0.40   375  67.2   101< 20.8
    50  1080  1130  0.070    1.73< 0.39   379  60.8   101  18.7
   100  1055  1155  0.111    1.96  0.49   381  59.6   103  17.1
   150  1030  1180  0.151    2.33  0.62   397< 47.9   106  10.2
   200  1005  1205  0.122    2.17  0.59   447  15.6   118   1.9 
   250   980  1230  0.136    2.65  0.63   497   1.2   136   0.0
   300   955  1255  0.141    3.72  0.72   552   0.1   156   0.0
...

Test Summary (4)  Number of Dates: 50  Standard Deviation: 50
Ihat could not be calculated 0 times

Test  Middle    dmax    Dist    Span     IQR    Ihat 
   1    1105       0       0     150     200     137
   2    1105      50      50     200     200     109
   3    1105     150     200     300     150     165
   4    1105     200     200     250     200     201
   5    1105     200     200     200     200     171
   6    1105      50      50     150     200     125
   7    1105      50       0     200     200     170
   8    1105     200     200     200     200     157
   9    1105     250     200     200     250     176
  10    1105       0       0     200     150     160
  11    1105     150     100     100     150      64
  12    1105     200     200     200     200     203
  13    1105     200     200     200     250     172
  14    1105     200     200     250     200     186
  15    1105     150     200     200     200     182
  16    1105     200     200     200     200     175
  17    1105     250     250     200     250     209
  18    1105     250     250     300     250     228
  19    1105     200     200     200     250     156
  20    1105       0       0     200     200     158

CSV Test Output

Sequence, Loop, Test, Mid_Date, dmax, Dist, Span, IQR, Ihat, Ndate, SD, Group
1,1,1,1105,0,50,150,0,128,100,   75, "N100/S 75" 
2,1,2,1105,100,100,150,0,77,100,   75, "N100/S 75"
3,1,3,1105,150,200,300,200,181,100,   75, "N100/S 75"
…
80,4,20,1105,0,0,200,200,158,50,   50, "N 50/S 50"

SYSTAT Command File Output

Save TESTRUN/S "True Interval 1000 to 1200; Middle=Mean;  Dist=Euclidean/NDate"

Input Sequence Loop Test Mid_Date Maxdif Dist Range Ndate SD Group$
Drop Sequence Loop Test NDate SD
Run
   1   1    1    1112     150     150     150   50   60 "N 50/S 60"
   2   1    2    1097     150     100     100   50   60 "N 50/S 60"
   3   1    3    1107     200     200     150   50   60 "N 50/S 60"
   4   1    4    1093     250     250     250   50   60 "N 50/S 60"
   5   1    5    1092     150     100     100   50   60 "N 50/S 60"
   6   2    1    1122     250     200     250   25   60 "N 25/S 60"
   7   2    2    1095     100     150     100   25   60 "N 25/S 60"
   8   2    3    1087     150     150     200   25   60 "N 25/S 60"
   9   2    4    1115     200     200     250   25   60 "N 25/S 60"
  10   2    5    1078     250     250     250   25   60 "N 25/S 60"

References Cited

Cowgill, George L. 1998. Some Simple Ways to Use Multiple Uncertain Dates to Estimate Intervals. Paper presented at the 1998 Inter-congress meeting Commission 4 of the International Union of Prehistoric and Protohistoric Sciences, Data Management and Mathematical Methods in Archaeology, 19-22 November 1998, Scottsdale, Arizona.

Page Last Updated: 3 October 2020

Home Top Overview Ordering Documentation