SASTechies: SAS Interview Questions and Answers(2)

If you’re not wanting any SAS output from a data step, how would you code the data statement to prevent SAS from producing a set?

Data _null_;

_NULL_ – specifies that SAS does not create a data set when it executes the DATA step.

Data _null_ is majorly used in

creating quick macro variables with call symput routine

eg.

Data _null_;

Set somedata;

Call symput(‘macvar’,dsnvariable);

Run;

Creating a Custom Report

Eg.

The second DATA step in this program produces a custom report and uses the _NULL_ keyword to execute the DATA step without creating a SAS data set:

data sales;
   input dept : $10. jan feb mar;
   datalines;
shoes 4344 3555 2666
housewares 3777 4888 7999
appliances 53111 7122 41333
;

data _null_;
   set sales;
   qtr1tot=jan+feb+mar;
   put ‘Total Quarterly Sales: ‘
       qtr1tot dollar12.;
run;

What is the one statement to set the criteria of data that can be coded in any step?
WHERE statement can sets the criteria for any data set in a datastep or a proc step.

Have you ever linked SAS code? If so, describe the link and any required statements used to either process the code or the step itself.

SAS code could be linked using the GOTO or the Link statement.

GOTO – http://support.sas.com/onlinedoc/913/getDoc/en/lrdict.hlp/a000201949.htm

LINK – http://support.sas.com/onlinedoc/913/getDoc/en/lrdict.hlp/a000201972.htm

The difference between the LINK statement and the GO TO statement is in the action of a subsequent RETURN statement. A RETURN statement after a LINK statement returns execution to the statement that follows LINK. A RETURN statement after a GO TO statement returns execution to the beginning of the DATA step, unless a LINK statement precedes GO TO, in which case execution continues with the first statement after LINK. In addition, a LINK statement is usually used with an explicit RETURN statement, whereas a GO TO statement is often used without a RETURN statement.

When your program executes a group of statements at several points in the program, using the LINK statement simplifies coding and makes program logic easier to follow. If your program executes a group of statements at only one point in the program, using DO-group logic rather than LINK-RETURN logic is simpler.

Goto eg.

data info;

input x;

if 1<=x<=5 then go to add;

put x=;

add: sumx+x;

datalines;

323

;

Link Eg.

data hydro;

input type $ depth station $;

/* link to label calcu: */

if type =’aluv’ then link calcu;

date=today();

/* return to top of step */

return;

calcu: if station=’site_1′

then elevatn=6650-depth;

else if station=’site_2′

then elevatn=5500-depth;

/* return to date=today(); */

return;

datalines;

aluv 523 site_1

uppa 234 site_2

aluv 666 site_2

…more data lines…

;

How would you include common or reuse code to be processed along with your statements?

- Using SAS Macros.

- Using a %include statement

When looking for data contained in a character string of 150 bytes, which function is the best to locate that data: scan, index, or indexc?

Index function – Searches a character expression for a string of characters

SAS Statements	Results
a=’ABC.DEF (X=Y)’; b=’X=Y’; x=index(a,b); put x;	10

For learning purposes

The INDEXC function searches for the first occurrence of any individual character that is present within the character string, whereas the INDEX function searches for the first occurrence of the character string as a pattern.

b=’have a good day’;

x=indexc(b,’pleasant’,'very’);

put x;

The INDEXW function searches for strings that are words, whereas the INDEX function searches for patterns as separate words or as parts of other words. INDEXC searches for any characters that are present in the excerpts.

s=’asdf adog dog’;

p=’dog ‘;

x=indexw(s,p);

put x;

If you have a data set that contains 100 variables, but you need only five of those, what is the code to force SAS to use only those variables?

Use KEEP= dataset option (data statement or set statement) or KEEP statement in a datastep.

eg.

Data fewdata (keep = var10 var11);

Set fulldata (Keep= VAR1 VAR2 VAR3 VAR4 VAR5);

Keep var6 var7;

Run;

Code a PROC SORT on a data set containing State, District and County as the primary variables, along with several numeric variables.

Proc sort data= Dist_County;

By state district city;

Run;

How would you delete duplicate observations?

noduprecs option in a Proc Sort.

data cricket;

input id country $9. score;

cards;

1 australia 342

2 somerset 343

1 australia 342

2 somerset 341

;

run;

proc sort data = cricket noduprecs;

by id;

run;

Here in the example observation 1 and 3 are duplicate records….so Obs 1 is retained…

How would you delete observations with duplicate keys?

nodupkey option in a Proc Sort.

proc sort data = cricket nodupkey;

by id;

run;

In the above example Observation 1/ 3 and 2 / 4 have duplicate key (variable id) values i.e. 1 and 2 respectively…so observations 3 / 4 get deleted…

How would you code a merge that will keep only the observations that have matches from both sets.

data mergeddata;

merge one(in=A) two(in=B);

By ID;

if A and B;

run;

How would you code a merge that will write the matches of both to one data set, the non-matches from the left-most data.

Data one two three;

Merge DSN1 (in=A) DSN2 (in=B);

By ID;

If A and B then output one;

If A and not B then output two;

If not A and B then output three;

Run;

What is the Program Data Vector (PDV)? What are its functions?

PDV is a logical area in memory where SAS builds a data set, one observation at a time. When a program executes, SAS reads data values from the input buffer or creates them by executing SAS language statements. The data values are assigned to the appropriate variables in the program data vector. From here, SAS writes the values to a SAS data set as a single observation.

Along with data set variables and computed variables, the PDV contains two automatic variables, _N_ and _ERROR_. The _N_ variable counts the number of times the DATA step begins to iterate. The _ERROR_ variable signals the occurrence of an error caused by the data during execution. The value of _ERROR_ is either 0 (indicating no errors exist), or 1 (indicating that one or more errors have occurred). SAS does not write these variables to the output data set.

Does SAS ‘Translate’ (compile) or does it ‘Interpret’? Explain. At compile time when a SAS data set is read, what items are created?

SAS compiles the code sent to the compiler.

When you submit a DATA step for execution, SAS checks the syntax of the SAS statements and compiles them, that is, automatically translates the statements into machine code. In this phase, SAS identifies the type and length of each new variable, and determines whether a type conversion is necessary for each subsequent reference to a variable. During the compile phase, SAS creates the following three items:

input buffer	is a logical area in memory into which SAS reads each record of raw data when SAS executes an INPUT statement. Note that this buffer is created only when the DATA step reads raw data. (When the DATA step reads a SAS data set, SAS reads the data directly into the program data vector.)
program data vector (PDV)	is a logical area in memory where SAS builds a data set, one observation at a time. When a program executes, SAS reads data values from the input buffer or creates them by executing SAS language statements. The data values are assigned to the appropriate variables in the program data vector. From here, SAS writes the values to a SAS data set as a single observation. Along with data set variables and computed variables, the PDV contains two automatic variables, _N_ and _ERROR_. The _N_ variable counts the number of times the DATA step begins to iterate. The _ERROR_ variable signals the occurrence of an error caused by the data during execution. The value of _ERROR_ is either 0 (indicating no errors exist), or 1 (indicating that one or more errors have occurred). SAS does not write these variables to the output data set.
descriptor information	is information that SAS creates and maintains about each SAS data set, including data set attributes and variable attributes. It contains, for example, the name of the data set and its member type, the date and time that the data set was created, and the number, names and data types (character or numeric) of the variables.

The Execution Phase

By default, a simple DATA step iterates once for each observation that is being created. The flow of action in the Execution Phase of a simple DATA step is described as follows:

The DATA step begins with a DATA statement. Each time the DATA statement executes, a new iteration of the DATA step begins, and the _N_ automatic variable is incremented by 1.
SAS sets the newly created program variables to missing in the program data vector (PDV).
SAS reads a data record from a raw data file into the input buffer, or it reads an observation from a SAS data set directly into the program data vector. You can use an INPUT, MERGE, SET, MODIFY, or UPDATE statement to read a record.
SAS executes any subsequent programming statements for the current record.
At the end of the statements, an output, return, and reset occur automatically. SAS writes an observation to the SAS data set, the system automatically returns to the top of the DATA step, and the values of variables created by INPUT and assignment statements are reset to missing in the program data vector. Note that variables that you read with a SET, MERGE, MODIFY, or UPDATE statement are not reset to missing here.
SAS counts iteration, reads the next record or observation, and executes the subsequent programming statements for the current observation.
The DATA step terminates when SAS encounters the end-of-file in a SAS data set or a raw data file.

All the variables are assigned missing values (Blank for character, . for numeric values)

Name statements that are recognized at compile time only?

drop, keep, rename, label, format, informat, attrib, where, by, retain, length, array

Name statements that are execution only.
INFILE, INPUT, Output, Call routines

Identify statements whose placement in the DATA step is critical.
DATA, INPUT, RUN, CARDS ,INFILE,WHERE,LABEL,SELECT,INFORMAT,FORMAT

Name statements that function at both compile and execution time.

options, title, footnote

In the flow of DATA step processing, what is the first action in a typical DATA Step?
The DATA step begins with a DATA statement. Each time the DATA statement executes, a new iteration of the DATA step begins, and the _N_ automatic variable is incremented by 1.

What is _n_?
The _N_ variable counts the number of times the DATA step begins to iterate.

It is one of the Automatic data step (and not proc’s) variables (the other one being _ERROR_) that SAS provides in a PDV. It should be noted that _n_ does not necessarily equal the observation number in a dataset.

How do I convert a numeric variable to a character variable?
Practically, the data type of a variable cannot be changed in one data step, but the data values could…One should create a new variable with data type character and assign the values of the numeric variable with a PUT function, drop the numeric variable, and rename the character variable to the numeric variable name.

Note: You would receive a warning saying that the variable has already been defined as numeric.

Eg.

http://support.sas.com/onlinedoc/913/getDoc/en/lrdict.hlp/a000199354.htm#a000226452

How do I convert a character variable to a numeric variable?

Practically, the data type of a variable cannot be changed in one data step, but the data values could…One should create a new variable with data type numeric and assign the values of the character variable with a INPUT function, drop the character variable, and rename the numeric variable to the character variable name.

Note: You would receive a warning saying that the variable has already been defined as character.

http://support.sas.com/onlinedoc/913/getDoc/en/lrdict.hlp/a000180357.htm

Nov 9, 2009

SAS Interview Questions and Answers(2)

No comments:

Post a Comment