![]() |
Tools for Quantitative Archaeology |
Programs using CSV (Commas Separated Value) Input
There have been several requests to simplify the data interface with spreadsheets and databases. I have written beta versions of several of the most used programs (kmeans, lden, twoway, and boone--with more to come) that will accept comma separated value (CSV) input rather than the standard program input format described elsewhere in the documentation. (I have also written a program, adf2csv.exe, that will convert a set of .,adf, .arl, and .acl files to .csv format.) I have not had time to rewrite the documentation but for preparing data files to analyze with Excel or another spreadsheet, this format is much easier to use. From Excel, or many other programs, you can use File>Save-As>CSV.
In addition, these new versions of the programs use dynamic storage allocation which allows them to do problems of almost unlimited size. At the same time I changed them to use comma separated value files for input rather than separate files for data and row and column labels. These are both steps on the road to a Windows interface, but these new programs retain the DOS-like interface though they are true windows programs. The paragraphs below describe the input format. Like the other programs, they are easiest to use if you copy the program to the directory with the data files and just double click on the program.
All of the input to the programs with names ending in csv goes in as one file rather than a data (.adf) file, a column label file (.acl) , and a row label file (.arl). They read the input pretty flexibly as a comma separated value (csv) file that can be used directly (in and out) with Excel or most other spreadsheets. Each line consists of some number of distinct values separated by commas or spaces (text including spaces can be enclosed in double quotes (e.g., "xx xx". ) Values must be separated by at least one space or with one comma (with optional spaces). However each comma separates a value, so A,,2,3 generates 4 values, A, an empty string, 2, and 3.
If the first line consists only of text values, the program assumes that they represent variable (column) labels. (If you want numbers for labels, enclose them in double quotes, e.g., "35". Otherwise default variable labels are V1, V2 etc. If the first line has any numeric values it treats it as a data line. Data lines must have the same number of values as the label line (if any). The first data line is used to decide if a given value is a character variable or a numeric variable (missing numeric values are not allowed) and, if there is no label line, it is used to determine the number of variables in a case. All the data for a single case must be on a single line, but the lines can be arbitrarily long. The character variables are ignored except that you may choose one as a case label. (The case labels can be arbitrarily long though they will be truncated by the programs for printing.)
These versions will take problems of basically unlimited size. Otherwise the programs work the same as the earlier, documented ones. These can however, produce .plt files that are too big for the corresponding plot programs to process. I have only done limited testing, as have a few colleagues and they do seem to work. There is also a program(adf2csv.exe) to convert adf rlf and clf files to the csv format. Unless you see a special need, you can keep using the other programs which have almost all been revised to run very large problems.
These revised programs are available to registered users with the purchase of an update.
| Home | Top | Overview | Ordering | Documentation |
Page Last Updated - 02-Jun-2007