Don't open SAS yet! Complete Step 1 first.
Note: This only needs to be done the first time you run SAS on the G-bar.
The most common error in a SAS programme is that you have forgotten a ';'
(semicolon).
If nothing happens in an interactive session, the most probable cause is that
you forgot the little:
run;
at the end of the program.
Another one is if you forget to end a comment. A comment is a statement which
starts with an * and ends with a ; (semicolon)
Remember: SAS is very strict on syntax. Therefore, remember every space, semi colon and asterisk that you see in the examples. Otherwise you will get an error.
SAS operates primarily with DATA steps and PROC steps. We will in the SAS
exercises concentrate on the PROC steps - the data will mostly be ready for use
on Multivariate Statistic's disk area.The data are accessed automatically by the
option
data=stat2.<something-or-other>
For the SAS-exercises we will usually have data in the correct SAS-format in
beforehand, but it is useful to be able to enter a few data in order to analyse
it. There are numerous ways of doing this, perhaps the simplest is the good
old-fashioned:
data xample;
input x1 x2; * two variables;
datalines;
45.9 98.4
3 56.3
45.3 -42
;
NOTE: the single observations are the rows, not the columns as we think about
them at the lectures.
The dataset "xample" is temporary, so it will be deleted when the SAS-job is
finished (or exited). (But you can always save your lille programme including
the datalines - it can then be run again next time you want it.)
The dataset has 2 variables (X1 and X2) and 3 observations. SAS manages the
loop-structure by itself, by reading in data and repeating INPUT X1 X2 until it
is finished, i.e. meets the ";". The ";" must stand on a line by itself. If
one's data are delimited by blanks as they are in this example, then we see that
it is really no problem to read them in.
One can easily see if the dataset is temporary or permanent by looking at the
name. A temporary dataset has a name consisting of one part, a permanent dataset
will have a name consisting of two parts:
xample is temporary, i.e. deleted when SAS
finishes.
If you want to save the data in the special SAS-dataset structure, you must use
a two-level name.
Example: substitute xample above with
here.xample which is permanent, i.e. kept - as a
SAS-dataset - when SAS finishes.
"here" is pre-defined in autoexec.sas (have a look
if you want) as the current directory. In most cases your login-directory.
Normally a SAS PROC step looks something like this:
proc <procedure-name> data=<dataset-name>;
An example is seen in the programme below.
If one only needs to analyse part of the variables in the SAS dataset, it is
(usually) possible to add a line:
var <variable list>;
The example from above would then look like:
proc print data=stat2.sundhed; * print data, but;
var alder vegt; * only age and weight;
run;
proc print Lists data, nicely formatted
proc means Computes mean, variance etc.
proc univariate As means but more descriptive. Eg:
proc univariate data=stat2.sundhed plot freq normal;
var vegt;
proc plot Plots data in quick and dirty
lineprinter-format. Ex:
proc plot data=stat2.sundhed;
plot vegt*alder;
proc gplot Plots in a nicer graphical format. MANY
options - difficult to use... Eg:
proc gplot data=stat2.sundhed;
plot vegt*alder;
proc corr Computes correlation matrices, and as an
option also the variance-covariance-matrix. Eg:
proc corr data=stat2.sundhed cov;
proc princomp Among other things computes
eigen-values and eigen-vectors. Eg.:
proc princomp data=xample cov;
analyses the covariance matrix based on the observations in the temporary
dataset xample. COV indicates we want to analyse
the covariance matrix. Exclusion of "COV" means we analyse the correlation
matrix.
For a more complete description see the very extensive SAS-manuals, which are
located online here.
The main purpose of today's exercise is to get used to using SAS - and to resolve some of the practical problems