![]() |
|
|
|||||||||||||||||
|
Objectives
Formatting data for SASData should consist of a number of cases, or observations, and each case should contain a number of distinct variables. The data should be entered in the form of a rectangular table (or matrix). That is, the cases should be entered in the rows and the variables should be entered in the columns. This is the preferred arrangement for SAS and for most other statistical packages. A sample questionnaire used in a survey might look like this: When keyed or typed into the computer the information (or raw data) might look like this: JANE 20 2 2 5 MICHAEL 18 1 5 2 MARIA 21 2 2 4 JUAN 26 1 4 3 MILDRED 28 2 3 4 GUNTHER 30 1 5 2 JOSEPH 25 1 4 4 JULIA 19 2 2 2 CODY 27 1 1 1 AARON 29 1 2 2 This is a rectangular data table. The variables or questions are recorded in the columns. The observations or cases in the rows. Understanding termsDATA VALUEThe basic unit of information is the DATA VALUE. In the field containing information about the respondent's gender, the DATA VALUES are 1 and 2, representing male and female. The DATA VALUES in the field containing the respondent's name are: JANE, MICHAEL and MARIA, etc. The DATA VALUES for the variable AGE are 20, 18 and 21, etc. The DATA VALUES for the fields containing the opinions, Questions 4 and 5, range from 1 to 5. This means that if a person has a 2 in either of the opinion fields, he answered AGREE to the question in the questionnaire. VARIABLEA set of data values that describes a given attribute makes up a VARIABLE. Each column of data values is a VARIABLE. For example, the first column in our data set is reserved for the VARIABLE we'll call NAME. It contains all of the names of the subjects in the study above. SAS VARIABLES are of 2 types - numeric and character. Values of numeric variables can only be numbers or a period (.) for missing data. Character VARIABLES can be made up of letters and special characters such as plus signs, dollar signs, colons and percent signs, as well as numeric digits. In the sample data above, NAME is a character VARIABLE and AGE is a numeric VARIABLE. OBSERVATIONAll the data values associated with a case, a single entity, a subject, an individual, a year, or a record and so on, make up an OBSERVATION. Each row of the data table (or Matrix) represents one OBSERVATION. The row belows represents all the data values associated with OBSERVATION #1.
DATA SETA DATA SET is a collection of data values usually arranged in a rectangular table (or matrix). A SAS DATA SET is the special way that SAS organizes and stores the data. The DATA step creates the SAS data set and the PROC steps are instructions indicating how the SAS data set is to be manipulated or analyzed. Rules for SAS names and SAS statementsRules for SAS namesAmong the kinds of SAS names that appear in SAS statements are variables names, SAS data sets, formats, procedures, options, and statement labels.
Some examples of SAS names are AGE, NAME, MAR_STAT etc. General rules for SAS statements
The next 4 terminal sessions assist you in creating, modifying, and running some simple procedures on a SAS data set. When you have completed them successfully, try substituting your own data, variable names, data set names, and so on. |
||||||||||||||||
![]()
| © University of New Mexico -- last updated -- comments to: docs@unm.edu |