TFQA: Tools for Quantitative Archaeology - Statistical Analysis Software for Archaeology TFQA
TFQA Logo

TFQA: Tools for Quantitative Archaeology
     kintigh@tfqa.com   +1 (505) 395-7979

TFQA Home
TFQA Documentation
TFQA Orders
Kintigh (ASU Directory)


SORTLINE: Sort Large Fixed-Format Files

       SORTLINE is an interactive utility that will sort large fixed-format files according to any number of sort keys (fields), providing some of the facilities available using mainframe sort packages. A fixed format file is one in which specific sets of adjacent column positions on each line (known as fields) contain specific categories of data. SORTLINE performs the sort in memory if there is sufficient room, or uses a disk for work space if necessary. It is far superior to the DOS SORT filter, which can be used only to sort very small files on a single sort key.

       The program treats each line in a file as a separate record, so multi-line records can be sorted only if all lines contain the necessary sort keys. The program will sort files with lines up to 255 characters long and will sort files that have up to 32767 lines. Because it allows disk switching, on floppy disk systems SORTLINE can sort files that are as large as the disk. On hard disk systems the size of the file is limited only by free space on the disk and by the 32767 line limit. Because the program defines sort fields based on their column positions in the line, files with embedded tab characters will generally not sort in the expected way. In this case, the tabs need to be removed and replaced with the proper number of spaces, using my UNTAB program or some other means. The program provides a record of its progress and, when it is done, reports a successful completion and displays the time required.

       Although SORTLINE will sort records according to the keys specified, it does not guarantee that records that are identical on all keys will keep the same order that they had in the input file. For the technically inclined used the sort used is TURBO TOOLBOX's implementation of Quicksort. SORTLINE uses a very fast sorting algorithm, however, the time required depends on the number of records in the input file, the maximum line length, and the number of sort keys. While short files are sorted very rapidly, files with several thousand lines may require a few minutes.

 

RUNNING SORTLINE

       To start SORTLINE, simply type SORTLINE<Enter>. (The input and output file names can optionally be given on the same line after SORTLINE but before <Enter>). The sort uses the ASCII collating sequence and is case sensitive. Thus, numbers 0-9 sort before capital letters A-Z, which sort before lower case letters a-z. Each prompt given by the program is described below. More information on conventions used in program prompts is provided in the section entitled "Program Conventions".

SORTLINE VERSION 2.0: General Purpose Fixed-Field Sort
  Maximum Line Length: 255 Maximum Records: 32767

Do You Want Detailed Instructions {N} ?

      The program can be run in two modes, depending on whether or not one wants to see detailed instructions. The detailed instructions tell you how to switch disks and are useful when sorting relatively large files on floppy disk systems. By "relatively large" I mean a file that is larger than the amount of free space on either the default or the other diskette (assuming that the input file is not on the default drive). If you understand this or don't know, try the sort without the detailed instructions. No harm will be done if you don't have enough room; the worst that can happen is that you will have to start over. A session without the detailed instructions is described below; the detailed mode is described in the section SORTING LARGE FILES, below.

Input File ?

      Give the name of the file to be sorted. (A drive and full path specification can be given.)

Upper Bound on Record Length {255} ?

      The sort can proceed more efficiently (faster and with less work space required) if it has some knowledge of the maximum line length in the file to be sorted. It is not essential that you know the exact number, just pick a number that you are sure is as large or larger than the length of the longest line of the file. For short files, simply hit <Enter>, the default of 255 will be fine. There is no danger involved in picking too small a number. If the program finds a line that is longer than the specified length, it will give a message and stop, but no harm will be done; simply restart the program and give a larger line length. Because of restrictions imposed by punch cards and some analysis programs, many data files will have no more than 80 characters, few data files will have lines longer than 132 characters.

Sort Key 1
  Position of Sort Field ?
  Length of Sort Field ?
  [A]scending or [D]escending {A} ?

      By answering these questions you specify the primary sort key. Thus, if you want to site your records in order of site number, operation within site, and level within operation, give the location (where 1 is the first column) and length of the site number field in the record. If the site number if in columns 5-8 on each line in the file, you would reply 5 to the position prompt and 4 to the length prompt. Next specify if you want the sites sorted in ascending or descending order. Usually ascending order, which is the default, is desired, so <Enter> can be pressed instead of A for the final prompt.

More Sort Keys ?

      If you want the file sorted according to more than one key, reply Y to this prompt. The program will then repeat the three Sort Key prompts listed above. The second sort key is used to order records that are the same on the first sort key; the third orders records that are the same on the first two, etc. Thus, if you are sorting records according to a unique identifier such as a catalog number, only one sort key is needed since one key alone completely specifies an order. However, if within sites, operations and levels are numbered sequentially starting with 1, you would need three keys.

Input Phase Complete: ?.? Seconds Elapsed
?? records read from filein.ext
Sort Phase Complete: ?.? Seconds Elapsed
Sorted Output File {filein.SRT} ?

       The first three messages above indicate progress of the program. It first reads records, and then sorts them. It then asks where you want the sorted output file written. As a default output file name it suggests the input file name with the extension .SRT. If the output file name you give is the same as the input file name, the program verifies that it is OK to overwrite that file.

Output Phase Complete: ?.? Seconds Elapsed
?? Records Written to fileout.ext
Sort: Successful Completion

       These are simply informative messages. If any errors occur, a message will be given indicating the problem. In most cases, the program will give an informative error message. However, if you get a run error you probably ran out of room on the disk where the program is trying to write the output file.

 

INPUT FILE

122:790:01
987:900:01
122:790:02
122:813:01

OUTPUT FILE

122:813:01
122:790:01
122:790:02
987:900:01

PROMPTS

Sort Key 1
  Position of Sort Field ? 1
  Length of Sort Field ? 3
  [A]scending or [D]escending{A} ? A
More Sort Keys {N} ? Y
Sort Key 2
  Position of Sort Field ? 5
  Length of Sort Field ? 3
  [A]scending or [D]escending{A} ? D
More Sort Keys {N} ? Y
Sort Key 3
  Position of Sort Field ? 9
  Length of Sort Field ? 2
  [A]scending or [D]escending {A}? A
More Sort Keys {N} ? N

Page Last Updated: 21 June 2022

Home Top Overview Ordering Documentation