TFQA: Correspondence Analysis Using Systat

Tools for Quantitative Archaeology
Email:

TFQA Home
TFQA Documentation
TFQA Orders
Kintigh's ASU Home

Correspondence Analysis Using SYSTAT

While this has nothing directly to do with TFQA, it may be useful to people who use it. Doing Correspondence Analysis with SYSTAT (available since version 8) is not a straightforward process. While there is a logic to the way that it is done in SYSTAT, it is not obvious how one goes about doing what, for archaeologists, is a typical corresepondence analysis in which the variables represent counts.

The following describes how to do correspondence using the menus, where “>” denotes a submenu option.

First, go to Edit> Options and under Display check the boxes for “Command Prompt” and “Statistical Quickgraphs.”
File>Open>Data systatfilename
Data>Reshape and Wrap, selecting all of the count variables. This creates a new data set eliminating the count variables and adding the variables TRIAL and MEASURE. (Any variables that are not wrapped are also maintained in the output file. One of these should be a case identifier that is unique (it can be a numeric or a string variable). For the purposes of discussion, lets assume that ikt is a string variable is called proven$. Then click on File>Save Data to save this file (e.g., as filenameWrap).
View this file (do View>Data if you need to) in order to make sure you know what you have in the wrapped file. In this format an observation represents each count on a variable for a site (rather than all counts for a site). TRIAL is simply the variable number, MEASURE is simply the count, and Proven$ uniquely identifies the observations.
You will want to use variable labels instead of variable numbers, 1-12. You can use the Label command to do this either by clicking Data>Label and filling in the window or, easier, by typing in the interactive window:

LABEL TRIAL / 1=var1name 2=var2name 3=var3name ... n=nthvarname

(If the names include spaces or special characters enclose them in single quotes, e.g., ‘St. Johns Poly’).

Note here that if your case identifier is not unique, or if you do not assign labels to all the values of TRIAL, or if you assign the same label to more than one value of TRIAL, the correspondence analysis will run, but will probably not give you what you want.

Now you are about ready to do the correspondence analysis. First click on Data> Frequency>MEASURE which weights each observation by MEASURE which is, after all, the count (but do not use Data>WEIGHT instead of FREQUENCY because it does not do what you want).
Then click on Data> IDVar and in the box, ADD (proven$), basically establishing the variable proven$ as the case label.
In Version 8, go to the Interactive window and type “Save filenameCA” at the < prompt. This saves the coordinates. (This is different than going to the menu and Saving the current file, this alerts the correspondence analysis to save the coordinates.) In version 9 or above this is handled from the menu in the next step.
For the Correspondence Analysis go to Stats> Data Reduction> Correspondence Analysis. In version 9 or higher, check the box that says “Save Coordinates.” At some point you will need to specify the name of the saved file, e.g. filenameCA. In all versions add as the dependent variable proven$ and TRIAL as the independent variable (or vice versa; it doesn’t matter). This will produce, a too-busy Correspondence Plot.

The printed output in the Main window, has for the observations and variables, respectively, sections with titles: Row Variable Coordinates and Column Variable Coordinates. Each has column labels Name, Mass, Quality, Inertia, Factor 1, Factor 2 and corresponding values for the cases and variables respectively; the values under Factor 1 and Factor 2 are the coordinates. (For present purposes, ignore the rest of the output.)

Note that if you don’t run straight through this process or if you reopen the wrapped file at any point you need to reenter the Frequency and ID Variable and Label commands. In particular, if you get an error that says something about a singularity, you may have forgotten to do the Data> Frequency. Note also that if you have cases or variables for which all counts are 0 you will get a singularity error or some other error–in any event, the analysis will not run.

You’ve now done the analysis. Open the saved file and use Graph>Scatterplot to create plots of the variables and observations. The coordinates for the observations and variables are saved separately in variables called Factor(1) and Factor(2) and Dim(1) and Dim(2), respectively (however these are coordinates in the same scale in the same space). To plot observations and variables on the same graph, use Data>Transform>IfThenLet to copy the values of Dim(1) and Dim(2) into Factor(1) and Factor(2), respectively (or vice versa). You might also create a new variable in this file called ObsVar that has a value of say 1 for observations and 2 for variables. For example: IF Label$=. Then Obsvar=1 Else Obsvar=2; IF Factor(1)=. Then Factor(1)=Dim(1);and as a third command IF Factor(2)=. Then Factor(2)=Dim(2). You should then do File>Save to save this transformed coordinate file.
You can now use Graph>Scatterplot to create better plots than the quickgraph. They will allow you to see what is going on. Using Data>Select Cases, you can use ObsVar to plot cases and variables separately or in Scatterplot>Appearance>Symbol and Label>Symbols>Select Variable OBSVAR and plot both cases and variables on the same plot with different symbols. You may also restrict the min and max on the axes to blow up a part of the plot. However if you don’t watch what you are doing and you mis-specify min and max you may inadvertently cut out some of the observations or variables.

Finally, I provide below a command file for an analysis I recently did. You can track the commands used in your own analysis from the menus in the log file. Note that single commands (e.g., label and plot) generally need to be typed on a single line (contrary to how it may appear below).

Use obap2b1
Wrap lino kiat redm pubw, resv tupi pubr winb winp sjpo piv
drop totl
save obap2b1wrap
run
label trial/1=Lino,2=Kiat,3=RedM,4=PuBW,5=Resv,6=TuPi,7=PuBR,8=WinB,9=WinP,10=SJPo,11=PIV
Freq=Measure
idvar=Prov$
coran 
Model Prov$=Trial
save obap2b1ca
Estimate

USE obap2b1ca
IF (Factor(1)=.) THEN LET ObsVar=1
If (Factor(1)<>.) then LET ObsVar=2
IF (Factor(1)=.)  THEN LET SiteNo$=label$
IF (Factor(1)<>.)  THEN LET Variable$=label$
IF (Dim(1)=.)  THEN LET Dim(1)=Factor(1) 
IF (Dim(2)=.)  THEN LET Dim(2)=Factor(2)
let dim(1)=-dim(1)
drop factor(1)
drop factor(2)
drop label$
ESAVE obap2b1ca2.SYD

USE   obap2b1ca2.syd
PLOT DIM2*DIM1 / XLABEL='Dim 1' YLABEL='Dim 2' SYMBOL=OBSVAR SIZE= 0.500 LABEL=VARIABLE$  
  CSIZE=0.750 LEGEND=NONE xmin=-1.5 xmax=3 ymin=-1.5 ymax

Page Last Updated - 02-Jun-2007