May 22, 2010

SAS Project 4: Govt....SAS Technical Problem to solve....DIY

SAS Technical Screening Problem asked at CDC, Atlanta…
The following 3 datasets exist in a hypothetical system:
COUNTY
COUNTY_IDSTATE_NAMECOUNTY_NAME
1TexasCollin
2TexasDallas
3GeorgiaDeKalb
AGE_DISTRIBUTION
COUNTY_IDAGE_0_10AGE_10_20AGE_20_40AGE_40_PLUS
1100204060
210104050
3451005667
AGE_DISTRIBUTION_DESC
CATEGORY_NAMECATEGORY_DESCRIPTION
AGE_0_10< 10 years
AGE_10_20Between 10 and 20 years
AGE_20_40Between 20 and 40 years
AGE_40_PLUS> 40 years
Requirement 1: Write a SAS program (with ODS) to use the datasets given above to generate HTML and MS EXCEL file that will look like the following (the files must have the 3 tables given below):
County Name
Age Distribution
Category
Total Number
Collin
< 10 years
100
Between 10 and 20 years
20
Between 20 and 40 years
40
> 40 years
60
Dallas
< 10 years
10
Between 10 and 20 years
10
Between 20 and 40 years
40
> 40 years
50
DeKalb
< 10 years
45
Between 10 and 20 years
100
Between 20 and 40 years
56
> 40 years
67
Category
County
Collin
Dallas
DeKalb
< 10 years
100
10
45
Between 10 and 20 years
20
10
100
Between 20 and 40 years
40
40
56
> 40 years
60
50
67
Total
220
110
268
Category
State
Texas
Georgia
< 10 years
110
45
Between 10 and 20 years
30
100
Between 20 and 40 years
80
56
> 40 years
110
67
Total
330
268
Note: The color scheme and other style elements (bold, italics etc) MUST be incorporated in the output.
Requirement 2: With a different dataset structure as given below, produce the same HTML and MS EXCEL file result as given above:
COUNTY
COUNTY_IDSTATE_NAMECOUNTY_NAME
1TexasCollin
2TexasDallas
3GeorgiaDeKalb
AGE_DISTRIBUTION
COUNTY_IDAGE_DISTR_CAT_IDDISTRIBUTION_COUNT
11.1100
12.220
13.340
14.450
21.110
22.210
23.340
24.450
31.145
32.2100
33.356
34.467
AGE_DISTRIBUTION_DESC
AGE_DISTR_CAT_IDCATEGORY_DESCRIPTION
1.1< 10 years
2.2Between 10 and 20 years
3.3Between 20 and 40 years
4.4> 40 years
What to submit: Please provide two separate self-contained SAS Programs (datasets and data in these datasets must be declared in the ‘DATA’ step of the programs, an external data source should not be used) which can be executed in the SAS console. The programs must generate a HTML file called ‘results.html’ and an MS Excel file called ‘results.xls’, the look and feel of which has been depicted earlier.
The following aspects must be kept in mind when developing the programs:
•Presentation: The generated HTML/MS Excel must resemble the output table structure given above.
•Program efficiency: The program must be as short and efficient as possible.
•In each program a macro variable must be used to determine whether to produce HTML output or MS-Excel output. We will alter the variable value (the possible values must be documented in the program) during execution to generate the desired output.

SAS Project 3: Retailing Industry....SAS Technical Problem to solve....DIY

This SAS Screening Problem was asked at Gap Inc…
The following shows 6 sample records of a large SAS data set.
The data set name is trans
CustomerIDtransactiondateamountcategory
980123410/01/1998123.98toys
980223412/10/199780.34books
980223412/10/1997100.00apparel
980555608/01/199622.90toys
980555609/10/199625.50apparel
980555610/11/199618.90books
980113411/11/199912.11toys
Write some SAS code that that will answer the questions below using the SAS file above.
These questions get at your understanding of different levels of summarization and in more general terms, how you approach programming problems. There are many ‘right’ answers.
You do not have to use the original dataset as input to each question, you may want to summarize at one step and use that summarization in later steps.
  1. Total and average amount spent by category
  2. Which category has the highest average purchase
  3. What is the average number of categories that customers purchase
  4. What is the average and total amount by customer
  5. What is the average number of days between purchases (as of today)

SAS Project 2: Clinical Trials....SAS Technical Problem to solve....DIY

Another small SAS Screening Problem asked at Amgen Inc
For the data below…
Dose
PatientDose Date
00101Jan2003
00102Jan2003
00215Mar2003
00201Mar2003
00301Apr2003
00419Mar2003
AE
PatientAE Start DateAE Text
00131Dec2002Headache
00102Jan2003Blurry Vision
00102Jan2003Anxiety
00202Mar2003Migraine
00201Mar2003Constipation
00215Mar2004Athlete’s Foot
00302Apr2003Depression
00302Apr2003Rash
Final
PatientDose DateAE Start DateAE Text
00101Jan200302Jan2003Blurry Vision
00101Jan200302Jan2003Anxiety
00201Mar200301Mar2003Constipation
00201Mar200302Mar2003Migraine
00301Apr200302Apr2003Depression
00301Apr200302Apr2003Rash
Questions
  1. Using SAS procedures and data steps, combine the Dose and AE datasets together to get the Final dataset. The Final dataset should include adverse events that occurred on a dosing date or one day after a dosing date.
  2. Do the same task without using a data step.