TFQA: Untab

Tools for Quantitative Archaeology
Email:

TFQA Home
TFQA Documentation
TFQA Orders
Kintigh's ASU Home

UNTAB: Replace Tabs and Unprintable Characters

UNTAB replaces all tabs, control characters, and extended characters in file with blanks or other printable characters. When analysis programs read raw data from a file, they generally expect the file to be in what is often called a standard ASCII file. Standard ASCII files contain only the regular printable characters plus carriage return and line feed, which are used to end a line. If a program that expects a standard ASCII file finds one of the other characters it will usually give an error message or do something awful.

UNTAB replaces tabs with spaces sufficient to fill out the standard DOS tabs located at every eighth character position. Unbeknownst to the user, some text editors, substitute tabs for multiple blanks in a file. In other instances, one may use tabs to separate columns of data. This can cause insidious problems, because it is almost impossible to see that the tabs are there since most editors and many printers display the tabs as spaces.

Although use of tabs may save storage space or speed up data entry, they wreak havoc if the file is used with programs other than the editor. For free format data, tabs may not be recognized as valid separators of the data elements; for fixed format data, the program will not read data from the proper columns because a single-character tab may replace anywhere from 1 to 8 spaces. While some programs, including those that I distribute, will accept tabs as separators, other analysis programs such as SYSTAT and SAS will not work properly with data files that have with embedded tabs.

While it is at it, UNTAB also checks for control characters (ASCII decimal values 0 through 31, except tab, carriage return, and line feed and decimal 127) and replaces each with a single blank. These characters should not be in raw data files and will generally cause problems similar to those caused by tabs. Finally, UNTAB checks for extended ASCII characters (ASCII characters with decimal values between 128 and 255) which are generally not wanted in data analysis files, but are created by WORDSTAR and similar word processors when operated in the document mode. If it finds any of these, it asks if you want them fixed. If so, extended character are replaced with their code in the standard (i.e. not extended) character set, as long as the replacement characters isn't a control character, otherwise it replaces it with a blank.

UNTAB simply reads each line from an input file, makes the proper substitutions, and writes the revised line into an output file. There are no practical limits to the problems the program can run although the input and output file names must be different. The program displays a period on the screen for every 100 lines that it processes, and at the end displays a message indicating the number of lines read and the number of tabs, control characters, and extended characters replaced.

RUNNING UNTAB

The second, one types UNTAB, and the program prompts for the input and output file names. For more information about program prompts see the section entitled "Program Conventions". The first way to run UNTAB is:

If one types UNTAB<Enter>The program prompts:

Input File Name ?

Output File Name ?

Page Last Updated - 21-Jul-2007