TFQA Logo

Tools for Quantitative Archaeology
   Email:
             Kintigh email

TFQA Home
TFQA Documentation
TFQA Orders
Kintigh's ASU Home


SORTLINE: Sort Large Fixed-Format Files

 

      SORTLINE is an interactive utility that will sort large fixed-format files according to any number of sort keys (fields), providing some of the facilities available using mainframe sort packages. A fixed format file is one in which specific sets of adjacent column positions on each line (known as fields) contain specific categories of data. SORTLINE performs the sort in memory if there is sufficient room, or uses a disk for work space if necessary. It is far superior to the DOS SORT filter, which can be used only to sort very small files on a single sort key.

 

      The program treats each line in a file as a separate record, so multi-line records can be sorted only if all lines contain the necessary sort keys. The program will sort files with lines up to 255 characters long and will sort files that have up to 32767 lines. Because it allows disk switching, on floppy disk systems SORTLINE can sort files that are as large as the disk. On hard disk systems the size of the file is limited only by free space on the disk and by the 32767 line limit. Because the program defines sort fields based on their column positions in the line, files with embedded tab characters will generally not sort in the expected way. In this case, the tabs need to be removed and replaced with the proper number of spaces, using my UNTAB program or some other means. The program provides a record of its progress and, when it is done, reports a successful completion and displays the time required.

 

      Although SORTLINE will sort records according to the keys specified, it does not guarantee that records that are identical on all keys will keep the same order that they had in the input file. For the technically inclined used the sort used is TURBO TOOLBOX's implementation of Quicksort. SORTLINE uses a very fast sorting algorithm, however, the time required depends on the number of records in the input file, the maximum line length, and the number of sort keys. While short files are sorted very rapidly, files with several thousand lines may require a few minutes.

 

RUNNING SORTLINE

 

      To start SORTLINE, simply type SORTLINE<Enter>. (The input and output file names can optionally be given on the same line after SORTLINE but before <Enter>). The sort uses the ASCII collating sequence and is case sensitive. Thus, numbers 0-9 sort before capital letters A-Z, which sort before lower case letters a-z. Each prompt given by the program is described below. More information on conventions used in program prompts is provided in the section entitled "Program Conventions".

 

SORTLINE VERSION 2.0: General Purpose Fixed-Field Sort

  Maximum Line Length: 255 Maximum Records: 32767

Do You Want Detailed Instructions {N} ?

 

      The program can be run in two modes, depending on whether or not one wants to see detailed instructions. The detailed instructions tell you how to switch disks and are useful when sorting relatively large files on floppy disk systems. By "relatively large" I mean a file that is larger than the amount of free space on either the default or the other diskette (assuming that the input file is not on the default drive). If you understand this or don't know, try the sort without the detailed instructions. No harm will be done if you don't have enough room; the worst that can happen is that you will have to start over. A session without the detailed instructions is described below; the detailed mode is described in the section SORTING LARGE FILES, below.

 

Input File ?

 

      Give the name of the file to be sorted. (A drive and full path specification can be given.)

 

Upper Bound on Record Length {255} ?

 

      The sort can proceed more efficiently (faster and with less work space required) if it has some knowledge of the maximum line length in the file to be sorted. It is not essential that you know the exact number, just pick a number that you are sure is as large or larger than the length of the longest line of the file. For short files, simply hit <Enter>, the default of 255 will be fine. There is no danger involved in picking too small a number. If the program finds a line that is longer than the specified length, it will give a message and stop, but no harm will be done; simply restart the program and give a larger line length. Because of restrictions imposed by punch cards and some analysis programs, many data files will have no more than 80 characters, few data files will have lines longer than 132 characters.

 

Sort Key 1

  Position of Sort Field ?

  Length of Sort Field ?

  [A]scending or [D]escending {A} ?

 

      By answering these questions you specify the primary sort key. Thus, if you want to site your records in order of site number, operation within site, and level within operation, give the location (where 1 is the first column) and length of the site number field in the record. If the site number if in columns 5-8 on each line in the file, you would reply 5 to the position prompt and 4 to the length prompt. Next specify if you want the sites sorted in ascending or descending order. Usually ascending order, which is the default, is desired, so <Enter> can be pressed instead of A for the final prompt.

 

More Sort Keys ?

 

      If you want the file sorted according to more than one key, reply Y to this prompt. The program will then repeat the three Sort Key prompts listed above. The second sort key is used to order records that are the same on the first sort key; the third orders records that are the same on the first two, etc. Thus, if you are sorting records according to a unique identifier such as a catalog number, only one sort key is needed since one key alone completely specifies an order. However, if within sites, operations and levels are numbered sequentially starting with 1, you would need three keys.

 

Input Phase Complete: ?.? Seconds Elapsed

?? records read from filein.ext

Sort Phase Complete: ?.? Seconds Elapsed

Sorted Output File {filein.SRT} ?

 

      The first three messages above indicate progress of the program. It first reads records, and then sorts them. It then asks where you want the sorted output file written. As a default output file name it suggests the input file name with the extension .SRT. If the output file name you give is the same as the input file name, the program verifies that it is OK to overwrite that file.

 

Output Phase Complete: ?.? Seconds Elapsed

?? Records Written to fileout.ext

Sort: Successful Completion

 

      These are simply informative messages. If any errors occur, a message will be given indicating the problem. In most cases, the program will give an informative error message. However, if you get a run error you probably ran out of room on the disk where the program is trying to write the output file.

 

SORTING LARGE FILES

 

      On a floppy disk system, if you run out of disk space room while sorting the program will give a message and stop. Your input file will not be harmed, but you will need to start again, with the following information in mind. On large sorts the program uses four blocks of disk space, the program file (SORTLINE.EXE), the input file, the output file, and a temporary work file. Because it only needs two of these at any one time, you can sort any file as long as none of these three exceed the size of a diskette. The input and output files are the same size, and the work file is generally about the same size as the other two.

 

      When you start the program by typing SORTLINE<Enter>, the program is loaded from the disk into memory and the program starts its execution by asking if you want detailed instructions. At this point, the SORTLINE.EXE file is no longer needed and may be removed from its disk drive, if necessary. Next the program reads the input file (from whatever drive you specify with the file name) and simultaneously creates the work file on the default drive. Thus, you want to have the input file on the drive that is not the default and put a blank formatted disk (or a disk that otherwise has sufficient room on it) in the default drive. When the input and sort are complete (the program will tell you this), the program no longer needs the input file, and the diskette in that drive (usually the one that is not the default drive) can be replaced with a blank formatted disk (or one that has enough room for the output file) on which the output file is written.

 

      If you still run out of disk space, try reducing the number given as the upper bound of the record length. Because the size of the work file depends on this number, if the length is given as 240 when the largest record length is actually 80, the work file will be three times as big as it needs to be. Recall that the program will tell you if you pick a number that is to small, and no harm will be done.

 

      The instructions in this section assume that you are trying to sort the largest possible file. For most sorts these steps will not be necessary. If you do need to switch disks the program will tell you when to do so as long as you have asked for detailed instructions. If, in the midst of the sort, you discover that you need a blank formatted disk but you don't have on, hit <Ctrl>C, format a disk, and start again. You input file will not be damaged under any circumstances.

 

INPUT FILE

122:790:01
987:900:01
122:790:02
122:813:01

OUTPUT FILE

122:813:01
122:790:01
122:790:02
987:900:01

PROMPTS

Sort Key 1
  Position of Sort Field ? 1
  Length of Sort Field ? 3
  [A]scending or [D]escending{A} ? A
More Sort Keys {N} ? Y
Sort Key 2
  Position of Sort Field ? 5
  Length of Sort Field ? 3
  [A]scending or [D]escending{A} ? D
More Sort Keys {N} ? Y
Sort Key 3
  Position of Sort Field ? 9
  Length of Sort Field ? 2
  [A]scending or [D]escending {A}? A
More Sort Keys {N} ? N

Home Top Overview Ordering Documentation

Page Last Updated - 02-Jun-2007