Sas cheat sheet

Posted on mer. 05 septembre 2018 in SAS

This article is mostly intended as a quick reference for myself. I cover a very limited subset of uses of various SAS statements and procedures; those which I have been using more or less repeatedly.

Importing data:

Libname documentation

Proc import documentation.

Remark: if nothing works, an easy way out is to use the import wizard (File -> Import Data). The generated command can then be saved in a file and inspected or modified.

CSV Files

data Table1;
    infile 'C:\path\to\file.csv' encoding='utf-8' delimiter = ';' MISSOVER DSD lrecl=13106 firstobs=2;
       informat Col1 $12. ;
       informat Col2 best32. ;
       informat Col3 anydtdte12. ;
    input
                Col1  $
                Col2
                Col3  $
    ;
run;

Dbf files

INSEE dataset for instance

proc import out = Mydata
            datafile = "C:\path\to\file.dbf"
            dbms = dbf replace;
     getdeleted = no;
run;

MS Access databases

libname accdb "C:\path\to\database.accdb";

data Table1;
    set accdb.table1;
run;

libname accdb clear;

Sas dataset from directory

To import C:\path\to\dir\Table1.sas7bdat.

libname mydb "C:\path\to\dir";

data Table1;
    set mydb.Table1;
run;

libname mydb clear;

Writing data to a SAS table

libname mydb "C:\path\to\output_dir";

data mydb.Table1;
    set Table1;
run;

libname mydb clear;

Macros

For someone who has been programming, SAS macros can feel very unintuitive. They are very much the tell-tale sign that the SAS language has been designed for statisticians, not for developpers nor computer scientists. The easiest way to think reason about them is to think about the macro system of the C preprocessor: SAS macro are mostly working as a text replacement tool, with some quirks.

Simplest macro:

Without variables, it is as simple as it gets.

%macro mymacro();
    proc freq data=Table1;
        table Col1;
    run;
%mend;
%mymacro();

With a loop

Variables, including loop variables, can be accessed with the &var. syntax. The loop syntax is close to for-loops in the usual languages. Documentation.

%macro mymacro();
    %do i=1 %to 10 %by 2;
        proc freq data=Table1;
            table Col&i.;
        run;
    %end;
%mend;
%mymacro();

With parameters

As previously, we use the &var. syntax to access the content of a variable.

%macro mymacro(tablename);
    %do i=1 %to 10;
        proc freq data=&tablename.;
            table Col&i.;
        run;
    %end;
%mend;
%mymacro(Table1);

Using externally defined macro variables

This is a bit more tricky. Assume you want to loop through the columns in a table. One way to do it is to define macro variables with convenient names, convenient meaning indexable.

The data _null_ statement allow one to operate without creating a dataset which is the default behaviour. The symput("varname", v) function assigns a value v to a variable called "varname". In the loop body, we then use the usual &var. syntax to access the loop variable value. However, we are not done: we need to further resolve Colname&i. to what we have defined in the data step. For this, we use the &&var1&var2. syntax, meaning "first resolve var2 to its value, then resolve the resulting expression. The process as I figure it out in my mind is the follwing: &&Colname&i. -> &&Colname1 -> &Colname1. -> "ID".

data _null_;
    call symput("Colname1", "ID");
    call symput("Colname2", "Group");
    call symput("Colname3", "Status");
run;

%macro mymacro();
    %do i=1 %to 10;
        proc freq data=table1;
            table &&Colname&i.;
        run;
    %end;
%mend;
%mymacro();

Plotting with sgplot

The sgplot procedure is fairly easy to use and flexible enough for most purposes. The only catch which has turned up to be utterly annoying is how the order of categorical variables is handled. I could not get it right even when sorting the data. The axes command documentation is very useful.

Series for plotting lines

A simple example for the series command.

proc sgplot data=Table1;
    title "Graph title";
    series x=year y=percent / markers lineattrs=(thickness=2);
    yaxis label = "%" grid;
    xaxis label = "Year";
run;

A more complex example with groups, formats, custom colors and a legend.

proc sgplot data=table1;
    format age age_intervals.;
    styleattrs datacontrastcolors=(blue green orange red);
    title "Graph title";
    series x=year y=percent / group=status;
    yaxis label = "%" grid;
    xaxis label = "Year";
    keylegend / title="Status" position=topright noborder;
run;

Band for range of values

This one is very handy for plotting min - max values.

proc sgplot data=Table1;
   band x=Age lower=Min_size upper=Max_size / group=Status;
   xaxis grid label='Age';
   yaxis grid values=(0 to 45 by 5) VALUESHINT label='Size';
run;

Vbar for vertical barplots.

Most of the options for vbar are valid for hbar.

proc sgplot data=table1;
    vbar Year / response=count;
    xaxis label='Year';
    yaxis grid label='Number found';
run;

Grouping data:

proc sgplot data=freq_by_sexe_class;
    vbar Year / response=count group=Status groupdisplay=cluster;
    xaxis label='Year';
    yaxis grid label='Number found';
run;

Density to visualize 1-D distributions

Sgplot's density is very simple to use.

proc sgplot data=table1;
    density Col1 / type=kernel group=Status;
run;

A useful alternative is proc univariate's histogram statement.

proc univariate data=table1;
    histogram Col1 / kernel;
run;

Plotting maps

Plotting maps is a bit more involved so I wrote a dedicated article: Plotting maps in SAS.

Various tricks using formats

Proc format can be helpful in various cases beyond the obvious usage. Here are some samples.

Defining intervals for numerical values

proc format;
    value age_intervals
        low - < 11 = "Child"
         11 - < 18 = "Teenager"
         18 - high = "Adult"
    ;
run;

proc freq data=Table1;
    format age age_intervals.;
    table age;
run;

Getting a summary about missing values

proc format;
 value $missfmt ' '='Missing' other='Not Missing';
 value  missfmt  . ='Missing' other='Not Missing';
run;

proc freq data=d_table;
    format _CHAR_ $missfmt.;
    tables _CHAR_ / missing missprint nocum nopercent;
    format _NUMERIC_ missfmt.;
    tables _NUMERIC_ / missing missprint nocum nopercent;
run;

Other tips

Writing the results of a query in a macro variable.

proc sql;
    select count(*)
    into :air_cnt
    from sashelp.air;
quit;
%put(&air_cnt);

Datalines statement for making dummy datasets

data dummy;
  input num text $;
  datalines;
1 "one"
2 "two"
3 "three"
;
run;