While this has nothing directly to do with TFQA, it
may be useful to people who use it. Doing
Correspondence Analysis with SYSTAT (available since version 8) is not a straightforward
process. While there is a logic to the way that it is done in SYSTAT, it is
not obvious how one goes about doing what, for archaeologists, is a typical
corresepondence analysis in which the variables represent counts.
The following describes how to do correspondence
using the menus, where “>” denotes a submenu option.
First, go to Edit> Options and under Display
check the boxes for “Command Prompt” and “Statistical Quickgraphs.”
File>Open>Data systatfilename
Data>Reshape and Wrap, selecting all of the count variables.
This creates a new data set eliminating the count variables and adding the
variables TRIAL and MEASURE. (Any variables that are not wrapped are also
maintained in the output file. One of these should be a case identifier that
is unique (it can be a numeric or a string variable). For the purposes of
discussion, lets assume that ikt is a string variable is called proven$.
Then click on File>Save Data to save this file (e.g., as filenameWrap).
View this file (do View>Data if you need
to) in order to make sure you know what you have in the wrapped file. In
this format an observation represents each count on a variable for a site
(rather than all counts for a site). TRIAL is simply the variable number,
MEASURE is simply the count, and Proven$ uniquely identifies the observations.
You will want to use variable labels instead
of variable numbers, 1-12. You can use the Label command to do this either
by clicking Data>Label and filling in the window or, easier, by typing
in the interactive window:
(If the names include
spaces or special characters enclose them in single quotes, e.g., ‘St. Johns Poly’).
Note here that if
your case identifier is not unique, or if you do not assign labels to all
the values of TRIAL, or if you assign the same label to more than one value
of TRIAL, the correspondence analysis will run, but will probably not give
you what you want.
Now you are about ready to do the correspondence
analysis. First click on Data> Frequency>MEASURE which weights each
observation by MEASURE which is, after all, the count (but do not use
Data>WEIGHT instead of FREQUENCY because it does not do what you want).
Then click on Data> IDVar and in the box,
ADD (proven$), basically establishing the variable proven$ as
the case label.
In Version 8, go to the Interactive window
and type “Save filenameCA” at the < prompt. This saves
the coordinates. (This is different than going to the menu and Saving the
current file, this alerts the correspondence analysis to save the coordinates.)
In version 9 or above this is handled from the menu in the next step.
For the Correspondence Analysis go to Stats> Data
Reduction> Correspondence Analysis. In version 9 or higher, check the
box that says “Save Coordinates.” At some point you will need
to specify the name of the saved file, e.g. filenameCA. In all versions
add as the dependent variable proven$ and TRIAL as the independent
variable (or vice versa; it doesn’t matter). This will produce, a too-busy
Correspondence Plot.
The printed output in the Main window, has for the observations and variables,
respectively, sections with titles: Row Variable Coordinates and Column Variable
Coordinates. Each has column labels Name, Mass, Quality, Inertia, Factor
1, Factor 2 and corresponding values for the cases and variables respectively;
the values under Factor 1 and Factor 2 are the coordinates. (For present
purposes, ignore the rest of the output.)
Note that if you don’t
run straight through this process or if you reopen the wrapped file at any
point you need to reenter the Frequency and ID Variable and Label commands.
In particular, if you get an error that says something about a singularity,
you may have forgotten to do the Data> Frequency. Note also that if you
have cases or variables for which all counts are 0 you will get a singularity
error or some other error–in any event, the analysis will not run.
You’ve now done the analysis. Open
the saved file and use Graph>Scatterplot to create plots of the variables
and observations. The coordinates for the observations and variables are
saved separately in variables called Factor(1) and Factor(2) and Dim(1) and
Dim(2), respectively (however these are coordinates in the same scale in
the same space). To plot observations and variables on the same graph, use
Data>Transform>IfThenLet to copy the values of Dim(1) and Dim(2) into
Factor(1) and Factor(2), respectively (or vice versa). You might also create
a new variable in this file called ObsVar that has a value of say 1 for observations
and 2 for variables. For example: IF Label$=. Then Obsvar=1 Else Obsvar=2;
IF Factor(1)=. Then Factor(1)=Dim(1);and as a third command IF Factor(2)=.
Then Factor(2)=Dim(2). You should then do File>Save to save this transformed
coordinate file.
You can now use Graph>Scatterplot to create
better plots than the quickgraph. They will allow you to see what is going
on. Using Data>Select Cases, you can use ObsVar to plot cases and variables
separately or in Scatterplot>Appearance>Symbol and Label>Symbols>Select
Variable OBSVAR and plot both cases and variables on the same plot with different
symbols. You may also restrict the min and max on the axes to blow up a part
of the plot. However if you don’t watch what you are doing and you
mis-specify min and max you may inadvertently cut out some of the observations
or variables.
Finally, I provide below a command file for an analysis I recently did. You
can track the commands used in your own analysis from the menus in the log
file. Note that single commands (e.g., label and plot) generally need to be
typed on a single line (contrary to how it may appear below).
Use obap2b1
Wrap lino kiat redm pubw, resv tupi pubr winb winp sjpo piv
drop totl
save obap2b1wrap
run
label trial/1=Lino,2=Kiat,3=RedM,4=PuBW,5=Resv,6=TuPi,7=PuBR,8=WinB,9=WinP,10=SJPo,11=PIV
Freq=Measure
idvar=Prov$
coran
Model Prov$=Trial
save obap2b1ca
Estimate
USE obap2b1ca
IF (Factor(1)=.) THEN LET ObsVar=1
If (Factor(1)<>.) then LET ObsVar=2
IF (Factor(1)=.) THEN LET SiteNo$=label$
IF (Factor(1)<>.) THEN LET Variable$=label$
IF (Dim(1)=.) THEN LET Dim(1)=Factor(1)
IF (Dim(2)=.) THEN LET Dim(2)=Factor(2)
let dim(1)=-dim(1)
drop factor(1)
drop factor(2)
drop label$
ESAVE obap2b1ca2.SYD
USE obap2b1ca2.syd
PLOT DIM2*DIM1 / XLABEL='Dim 1' YLABEL='Dim 2' SYMBOL=OBSVAR SIZE= 0.500 LABEL=VARIABLE$
CSIZE=0.750 LEGEND=NONE xmin=-1.5 xmax=3 ymin=-1.5 ymax