SAS Tutorial — Session 1

Contents
Introduction
Session 1
Session 2
Session 3
Session 4
Session 5
Session 6
Odds -n- Ends
 
 
 
 
 
 
 
 
 

Objectives

  1. Understand how data are to be formatted for SAS programs.

  2. Understand the terms: data value, variable, observation, and data set.

  3. Understand the rules for writing SAS statements and for naming variables and data sets.

Formatting data for SAS

Data should consist of a number of cases, or observations, and each case should contain a number of distinct variables. The data should be entered in the form of a rectangular table (or matrix). That is, the cases should be entered in the rows and the variables should be entered in the columns. This is the preferred arrangement for SAS and for most other statistical packages.

A sample questionnaire used in a survey might look like this:

When keyed or typed into the computer the information (or raw data) might look like this:

    JANE      20 2 2 5
    MICHAEL   18 1 5 2
    MARIA     21 2 2 4
    JUAN      26 1 4 3
    MILDRED   28 2 3 4
    GUNTHER   30 1 5 2
    JOSEPH    25 1 4 4
    JULIA     19 2 2 2
    CODY      27 1 1 1
    AARON     29 1 2 2
    

This is a rectangular data table. The variables or questions are recorded in the columns. The observations or cases in the rows.

ToC

Understanding terms

DATA VALUE

The basic unit of information is the DATA VALUE. In the field containing information about the respondent's gender, the DATA VALUES are 1 and 2, representing male and female. The DATA VALUES in the field containing the respondent's name are: JANE, MICHAEL and MARIA, etc. The DATA VALUES for the variable AGE are 20, 18 and 21, etc.

The DATA VALUES for the fields containing the opinions, Questions 4 and 5, range from 1 to 5. This means that if a person has a 2 in either of the opinion fields, he answered AGREE to the question in the questionnaire.

VARIABLE

A set of data values that describes a given attribute makes up a VARIABLE. Each column of data values is a VARIABLE. For example, the first column in our data set is reserved for the VARIABLE we'll call NAME. It contains all of the names of the subjects in the study above.

SAS VARIABLES are of 2 types - numeric and character. Values of numeric variables can only be numbers or a period (.) for missing data. Character VARIABLES can be made up of letters and special characters such as plus signs, dollar signs, colons and percent signs, as well as numeric digits.

In the sample data above, NAME is a character VARIABLE and AGE is a numeric VARIABLE.

OBSERVATION

All the data values associated with a case, a single entity, a subject, an individual, a year, or a record and so on, make up an OBSERVATION. Each row of the data table (or Matrix) represents one OBSERVATION. The row belows represents all the data values associated with OBSERVATION #1.

    JANE 20 2 2 5

DATA SET

A DATA SET is a collection of data values usually arranged in a rectangular table (or matrix).

A SAS DATA SET is the special way that SAS organizes and stores the data.

The DATA step creates the SAS data set and the PROC steps are instructions indicating how the SAS data set is to be manipulated or analyzed.

ToC

Rules for SAS names and SAS statements

Rules for SAS names

Among the kinds of SAS names that appear in SAS statements are variables names, SAS data sets, formats, procedures, options, and statement labels.

  1. SAS names must be between 1 and 8 characters long.

  2. The first character must be a letter or an underscore.

  3. Characters after the first may be letters, numbers or underscores.

Some examples of SAS names are AGE, NAME, MAR_STAT etc.

General rules for SAS statements

  1. SAS statements may begin in any column of the line.

  2. SAS statements end with a semicolon (;).

  3. Some SAS statements consist of more than one line of commands.

  4. A SAS statement may continue over more than one line.

  5. One or more blanks should be placed between items in SAS statements. If the items are special characters such as '=', '+', '$', the blanks are not necessary.

The next 4 terminal sessions assist you in creating, modifying, and running some simple procedures on a SAS data set. When you have completed them successfully, try substituting your own data, variable names, data set names, and so on.

ToC

© University of New Mexico -- last updated -- comments to: docs@unm.edu