TFQA: Tools for Quantitative Archaeology - Statistical Analysis Software for Archaeology TFQA
TFQA Logo

TFQA: Tools for Quantitative Archaeology
     kintigh@tfqa.com   +1 (505) 395-7979

TFQA Home
TFQA Documentation
TFQA Orders
Kintigh (ASU Directory)


Program Conventions

This section provides general information concerning user interaction with the programs, including the program prompts and the format required of data files. Each program gains the information that it needs by prompting you with a series of questions that follow some simple conventions, described below. The programs obtain their data from files that are created by use of a text editor (such as notepad) or are output by other programs. A flexible format used in reading and writing numeric data files, so that these data sets should be easy to interface with programs outside this package.

Every attempt has been made to make the programs well-behaved and easy to use. In general, the defaults are conservative and are designed to protect you; for example, they will not erase files without your permission. If you're not sure what a prompt means, refer to the documentation or take the default. Once you begin to use the programs, you should not need to refer much to the documentation. However, it is worthwhile to read through the documentation at least once in order to understand the scope and flexibility of the programs. As with any package that you purchase, you should make a backup copy in case something happens to the one you are using.

The documentation is intended to convey the necessary information to run the programs. While I have tried to make it reflect the behavior of the programs as they now stand, occasional changes are made to the programs that may make the documentation in some ways inaccurate. In many cases, the changes should be obvious. When new options are added, they are generally added in such a way that the default response does the best thing.

I have not attempted to provide comprehensive information on the use of these methods. However, a number of references are included in the discussions and a bibliography is provided at the end of this documentation. A comprehensive summary of the spatial analysis techniques is provided by Kintigh (1990) and Blankholm (1991).

PROMPTS

Most programs are documented mainly by explaining the questions asked by the program. Because the data and answers to previous prompts affect later choices, not all prompts in the documentation will appear during each run. This simply indicates that the question is not relevant, given the data and previous responses.

In general, the program prompts are self-explanatory. However, the statement of some basic conventions may help answer questions that arise. There are six basic classes of prompts, yes-no prompts, key word prompts, integer prompts, real prompts, file name, and text prompts. For each prompt, the default response, obtained by hitting Enter, is shown in braces. For example "File Exists; OK to Erase It {N} ?" is asking if it may over-write an existing file, and a reply of Enter or N would be equivalent. If nothing is given in braces, then there is no default. Except for text prompts, upper, lower, and mixed case are treated the same. Thus, CERAMIC.DAT, ceramic.dat, and CeRaMiC.DaT are considered the same.

Prompts that imply a yes or no answer are responded to by hitting Y or y for yes or N or n for no. In the true DOS programs, Yes-no prompts and key word prompts do not require that you hit Enter; as soon as a character is entered, it is read by the program, otherwise you need to press <enter> after Y or N.

When a key word prompt is given, you must reply with the first letter one of the words that is shown in brackets. For example, the valid replies to "Output [D]istance or [S]imilarity Matrix {S} ?" are S, s, D, d, and Enter. If you do not give a valid reply the program will not accept the response.

Prompts that request numeric input, file names, or text, all require that you hit <Enter> once you have typed your response. In the latter cases, the backspace key may be used to correct typing errors prior to the time that Enter is pressed. Numeric prompts are generally checked for proper form and to make sure that they are in the right range. When an erroneous response is given, an error message is usually provided and the prompt is repeated.

Whenever a file name is requested, a full DOS-compatible path name can be specified (see Running the TFQA Programs under Windows for more information about this; if you don't know the difference between a file and path name, don't worry about it). All of the following might be valid replies to a request for a file name: CERAMIC.DAT, B:CERAMIC.DAT, and C:\DATA\CERAMIC.DAT. While you will usually give a file name when one is requested, a DOS device name (usually CON, PRN, or NUL) can also be specified for either input or output files. To supply program input from the console, i.e. the PC keyboard, reply CON. For an output file, a reply of CON will put output on the screen; a reply of PRN will send it to your printer (assuming it is turned on and properly hooked up to your computer and not a network). A reply of NUL for an output file will result in no output being produced. Use of the NUL device is useful when using programs that produce voluminous printed output that may not be necessary in all cases. (PRN or NUL may produce problems in Windows XP.)

There are different kinds of defaults offered on file name prompts. In some cases, a complete file name is suggested as a default, in others, only a file extension is suggested, otherwise, there is no default. If an extension is given as the default and you enter a file name without a period or extension, the file name is assumed to have the default extension. Thus, if the default is .DST, then a reply of TEST is equivalent to TEST.DST. However, specification of a period and following a file name overrides the default.

DATA FILE FORMATS

Most programs are designed to perform a specified task on a set of numerical input data, usually contained in a file on a disk. As you would expect, the analytical programs view data as encoding some number of variables for each of a number of cases. The program has a very flexible input format that is described below. However, it does not read character variables. Thus, if you want to analyze data that have the variable "Form" recorded as "BOWL," "JAR," or "LADLE," you'll need to recode it to numerical representation, e.g., 1, 2, or 3. It is generally easy to export data from other statistical, spreadsheet or database programs in a form that can be read by my programs.


With the exception of CONVSYS, the programs only read plain ASCII files. Plain ASCII files can be created using most text editors and word processing programs. One, Notepad, comes with Windows; a better alternative Notepad+ is freely available over the web as are a number of other Notepad substitutes. (Textpad is also excellent but costs to register after a trial period.) To create a plain ASCII file using a word processing program, it is necessary to specifically direct the program not to save formatting information the file. In Word or Word Perfect, you will need to instruct the program to save the file as a Text (txt file) so that it does not include the formatting codes. It is generally much better to use Notepad+ or something similar.

Antana Format Files

Any program that reads and processes numeric data has to know how those data are formatted, that is, it must know what conventions were used in encoding the data. Most of my programs assume that the input file is an ASCII text file that follows the minimal format (described in detail below) used by the Antana statistical package (DIVERS, with a different input format is the principal exception). The data files written by my programs generally follow Antana format as well, so that it is easy to perform additional analyses on the resulting files. Antana format was chosen because of its extreme flexibility; many data sets that follow a more restrictive format will be readable with very little modification. The Antana format output files produced by my programs can easily be read by other statistical programs, such as SYSTAT or SAS.

Basically, Antana format assumes that the first two numbers in the file are, respectively, the number of rows (observations) and columns (variables) in the data set. The data set is composed of the next rows*columns numbers, separated by one or more blanks, commas, tabs, or carriage returns. The numbers fill the data matrix row by row, that is, the first "columns" numbers read fill the first row, the next "columns" numbers fill the second, and so forth.

Note that new observations need not start on a new line. In addition, any text enclosed by # characters and all text from an unterminated # to the end of that line is ignored. (This convention allows you to put comments in your file.) These programs will not handle missing data, you must eliminate it in advance.

The numbers, themselves, can be input in almost any sensible form. For example, 173.15, 173.1500, 0173.15, 1.7315e02, 1.73150E+2 are read as the same number. In some cases integer numbers are required. For example, the number of rows and number of columns must be integers, and counts are expected to be integers. The format of integers is similarly flexible: 987, 987., 987.000, 9.87E2, and 9.870000e+02 are all read the same. However 156.5 is not an integer and will produce an error message.

The following three files are considered identical, each consisting of the same four variables for the same three observations:

#This is a test data set 
# Rows # 3   # Columns #4
2 4 10 5
3 5  6 1 
7 0  0 4

3 4 2 4 10 5 3 5 6 1 7 0 0 4

3.0
4.
2  4.0, 10 5.0, 3, 5.0
6.0
1 7.0000         0.0  0  4

If you wish enter an Antana format data file from the PC keyboard (ONLY do this for very small data sets), reply CON to the request for an input file name. If nothing happens, the program is waiting for you to enter the data just as if it were coming from a file, with the number of rows and columns first. You can enter any number of spaces or <Enter> as you wish between values. You can correct your mistakes, using the backspace key, as long as you have not hit <Enter>. The program will continue to appear more or less dead until you have entered enough data values and a final Enter. You must then enter terminating end of file mark which is accomplished by <Ctrl>Z (hold down the Ctrl key and then press the "Z" key and release both keys. If you reply CON to the request for an input file name, some programs are a little friendlier. They will ask you:

Number of Rows (Observations) ?

Number of Columns (Variables) ?

To these prompts type the number of observations and variables that you have. If you have told it that you will be reading 3 rows with 4 variables each, you will see:

Enter 12 Values Followed by <Enter>

At this point you can enter the four counts for the first case followed by the four counts for the second case. You can enter these on any number of lines (that is, you can hit <Enter>), separated by blanks, commas, or tabs. Be sure to hit <Enter> or <Ctrl>Z after you finish typing the last value. For example:

? 2 4 10 5 3

? 5, 6, 1 7 00 0 4

gives the same results as the files shown above.

Program output in Antana form follows these conventions. A new line is started for each observation, and output lines are usually 80 or fewer characters long (although these conventions are not required by Antana, they simplify use with other programs). Except for the very beginning of a file, comments are not generally included, again, to simplify use with other programs.

Row and Column Label Files

Many programs provide the option of using or column label files, with default extensions .ARL and ACL (Antana Row- and Column-Label Files). These are separate ASCII files in which each line of the file is read as a row, or column label for the observation in that position in the data file. The labels may include any ASCII characters including blanks and may be of any length but should not be completely empty. They are truncated to different lengths by different programs, but generally at least 8 characters are used. These row and column label files can be created using Notepad or other ASCII editor.

Creating Antana Format Files

If you are preparing a data set specifically for use with my program, just follow the simple conventions described above and use an editor such as Notepad (which comes with windows) or any other program that saves pure ASCII files.

If you are already using the data with a statistical program, spreadsheet, or database management package (for example SYSTAT, Excel, or Access), you can probably use that program to create an ASCII file with the data in a form that can be read by my programs, after inserting the row-column header at the beginning of the file. However if your data set has character variables, you should recode them into a numerical form before exporting the data.

If you have a fixed format ASCII file with the data values that you want to analyze separated by at least one space, or if you have a free format file with the values, separated by blanks or commas, all you need to do is insert a line at the beginning of the file with the number of rows and columns in the data set. This can be easily done with Notepad or an equivalent. If you have a fixed format file with one line per observation, in which the data values are not in the correct order or not are separated by blanks or commas, you can use my MVC program to extract and properly order the data that you need.

ADFUTIL program allows you create a new file with selected columns from an existing Antana format file. It also allows you to generate random data sets with any number of cases and variables.

If you are using SYSTAT and you have an you can use CONVSYS to very quickly and easily create an Antana format version of a SYSTAT file.

CSV Format and Programs

In the CSV subdirectory, created when you install the program, are beta versions of several of the main programs (at this writing, Boone, kmeans, lden, and twoway) that will accept comma separated value (CSV) input rather than the standard program input described in this documentation. It is expected that the analytical programs will all be eventually converted to accept this format. If you prepare analysis files with Excel or another spreadsheet, this will probably be the easiest to format work with (do Save As - CSV in Excel). Details are provided in a separate document entitled Programs Using CSV Input.I have only done limited testing, as have a few colleagues and they do seem to work. I have included a program (adf2csv.exe) to convert sets of adf rlf and clf files to the csv format. Please let me know of problems.

Unless you see a special need, you can keep using the other programs which have almost all been revised to run very large problems.

MVSP Format

The Ford program allows data sets in MVSP format. If the extension of the input data file is .MVS then the program expects MVSP files. MVSP format is similar to Antana format but labels are included within the data file. MVSP requires a header line with either

  * rows cols

or

  *L rows cols

Where "rows" and "cols" are replaced by the number of cases and variables, respectively. The * must be in column 1. After "cols," the remainder of the header line is ignored. If L for labels is specified, the second and subsequent lines of the file are read until "cols" labels for the variables have been found. Then, data are read for each case. If L has been specified each new row is preceded by a label for that case. Labels may not include blanks. In Kintigh's programs, the length of the labels allowed varies by program. These programs may accept formats (including use of # comments) not accepted by MVSP, but it is intended that all formats acceptable to MVSP will be acceptable here.

Page Last Updated: 1 April 2020

Home Top Overview Ordering Documentation