Getting started with SAS in the G-bar

Don't open SAS yet! Complete Step 1 first.

Step 1: Get access to the data sets

Note: This only needs to be done the first time you run SAS on the G-bar.

  1. Log onto Campusnet and download autoexec.sas and stat2.zip. Download them to your home directory (~/).
  2. Open a terminal and write 'unzip stat2.zip'.
  3. Write in the following command:
      
        unzip stat2.zip

  4. Close the terminal

  5. "autoexec.sas" contains a command which sets the linesize and pagesize of the output. Further it contains a reference to most of the data you will be using in Multivariate Statistics and a reference to your own current directory.

Step 2: Your first program in SAS

  1. Goto Course Software > Statistics > SAS

    Now you should have SAS popping up as 4 windows:
    PROGRAM EDITOR contains the sas program (myown.sas)
    LOG
    contains the program log (myown.log)
    OUTPUT (hidden) contains the output (myown.lst).
    Finally, there is a 4th window - a ToolBox with icons.

    You can at any time select the View-menu and choose the window you want to see.
  2. Enter the following 4 SAS-program lines into the PROGRAM EDITOR:

    * Hurrah!! My first SAS-programme; * comment;
    * empty line;
    proc print data=stat2.sundhed; * print data;
    run;
  3. Run the program by pressing the lille running man on the ToolBox. Alternatively, under Run press Submit.
    You can now examine the LOG and OUTPUT windows.
  4. Often you may wish to recall the SAS program you have just run - at least in order to save it in a file for future use.
    This is done by pressing Run > Recall last submit in the Program Editor window. (See SAS hints if you do not want SAS to clear the editor on every run.)
  5. It is also useful to clear the log and output windows once in a while. This is done by pressing Edit > Clear all in the respective windows.

Common problems

The most common error in a SAS programme is that you have forgotten a ';' (semicolon).
If nothing happens in an interactive session, the most probable cause is that you forgot the little:

run;

at the end of the program.
Another one is if you forget to end a comment. A comment is a statement which starts with an * and ends with a ; (semicolon)

Remember: SAS is very strict on syntax. Therefore, remember every space, semi colon and asterisk that you see in the examples. Otherwise you will get an error.

About SAS - condensed.

SAS operates primarily with DATA steps and PROC steps. We will in the SAS exercises concentrate on the PROC steps - the data will mostly be ready for use on Multivariate Statistic's disk area.The data are accessed automatically by the option

data=stat2.<something-or-other>

A simple datastep in SAS.

For the SAS-exercises we will usually have data in the correct SAS-format in beforehand, but it is useful to be able to enter a few data in order to analyse it. There are numerous ways of doing this, perhaps the simplest is the good old-fashioned:

data xample;
input x1 x2; * two variables;
datalines;
45.9 98.4
3 56.3
45.3 -42
;

NOTE: the single observations are the rows, not the columns as we think about them at the lectures.

The dataset "xample" is temporary, so it will be deleted when the SAS-job is finished (or exited). (But you can always save your lille programme including the datalines - it can then be run again next time you want it.)

The dataset has 2 variables (X1 and X2) and 3 observations. SAS manages the loop-structure by itself, by reading in data and repeating INPUT X1 X2 until it is finished, i.e. meets the ";". The ";" must stand on a line by itself. If one's data are delimited by blanks as they are in this example, then we see that it is really no problem to read them in.

One can easily see if the dataset is temporary or permanent by looking at the name. A temporary dataset has a name consisting of one part, a permanent dataset will have a name consisting of two parts:

xample is temporary, i.e. deleted when SAS finishes.

If you want to save the data in the special SAS-dataset structure, you must use a two-level name.
Example: substitute xample above with here.xample which is permanent, i.e. kept - as a SAS-dataset - when SAS finishes.

"here" is pre-defined in autoexec.sas (have a look if you want) as the current directory. In most cases your login-directory.

Generally on PROC steps

Normally a SAS PROC step looks something like this:

proc <procedure-name> data=<dataset-name>;

An example is seen in the programme below.

If one only needs to analyse part of the variables in the SAS dataset, it is (usually) possible to add a line:

var <variable list>;

The example from above would then look like:

proc print data=stat2.sundhed; * print data, but;
var alder vegt; * only age and weight;
run;

Some often used procedures

proc print Lists data, nicely formatted

proc means Computes mean, variance etc.

proc univariate As means but more descriptive. Eg:
proc univariate data=stat2.sundhed plot freq normal;
var vegt;


proc plot Plots data in quick and dirty lineprinter-format. Ex:
proc plot data=stat2.sundhed;
plot vegt*alder;


proc gplot Plots in a nicer graphical format. MANY options - difficult to use... Eg:
proc gplot data=stat2.sundhed;
plot vegt*alder;


proc corr Computes correlation matrices, and as an option also the variance-covariance-matrix. Eg:
proc corr data=stat2.sundhed cov;

proc princomp Among other things computes eigen-values and eigen-vectors. Eg.:
proc princomp data=xample cov;
analyses the covariance matrix based on the observations in the temporary dataset xample. COV indicates we want to analyse the covariance matrix. Exclusion of "COV" means we analyse the correlation matrix.

For a more complete description see the very extensive SAS-manuals, which are located online here.

Exercises

The main purpose of today's exercise is to get used to using SAS - and to resolve some of the practical problems

  1. Modify your first program, so that it both prints out the simple statistcs for the variables 'alder' and 'vegt' and also plot age against weight using 'gplot'. Notice how you get two result windows: The output window with the statistics and a graph window with the plot.

  2. As soon as you make programs a little more complicated you will get several plots and a long list of results. This quickly becomes difficult to make sense of. Instead you can use the command 'ods graphics on;' to embed the graphs and plots into your results.
    Unfortunately the otherwise wonderful graphics of the SUN-terminals can not render this. Instead we add a 'ods pdf file="MyResults.pdf";' after the other command. This will write your entire output to a pdf file, which you can then inspect in a viewer of your own choice. You can also choose other formats, e.g. HTML. Remember to add 'ods graphics off;' and 'ods pdf close;' at the end of your program after 'run;'.
    Modify your program to embed the plot and write it to a pdf/HTML/PS or your favorite format

  3. Download heiwei-data.sas and hwex.sas from Campusnet. Put them where you want, e.g. in a folder you make for this course. Open the program heiwei-data.sas and run it. This generates a permanent dataset, that you can acces the next time you open SAS.

  4. Run the sample program hwex.sas on height-weight data and familiarize yourself with the program and the output! SAS gives many test statistics. If you are not familiar with those, refer back to the SAS manual, your elementary statistics course, or use articles in Wikipedia. They are normally a good first reference.

  5. Modify the program so that you use another interval for the random number generation.

  6. Use Proc Capability to test whether the random numbers you have chosen are uniformly distributed over a suitable interval. Use that a uniform distribution over the interval [0,1] is a Beta-distribution with shape parameters (alpha, beta)=(1,1). You may shift and scale the distribution with the parameters theta and sigma. For more details go to the SAS Manual. You may find this by Google-searching ‘proc capability sas’, open the manual, then click ‘histogram statement’, then ‘syntax’, then ‘summary of options’.

  7. Which of the regression coefficients are statistically significant.

  8. How would you predict the Weight of a Child knowing its Height?

  9. And how would you predict the Height of a child knowing its Weight?

  10. Dertermine the correlation between Height and Weight using Proc Corr.

  11. Compare the value of the correlation with the R-Square value from the regression analysis!