Jan 7, 2011

SAS/CONNECT Parallel Processing on a SAS SMP Machine

The driving strength of the SAS System has always been the ability to analyze huge amounts of data and turn it into information — information that can be used to make good, profitable decisions. However, SAS is a single-threaded application that, for the most part, can utilize only a single processor at a time.
MP CONNECT enables parallel processing with SAS Version 8, making use of idle processors to reduce the time required to process huge amounts of data. It gives users the ability to run the SAS System in parallel, multiplying the analytic capabilities of SAS for more timely business decisions. In addition, MP CONNECT not only allows parallel processing with SAS to better utilize the processing power of stand-alone multiprocessor servers, but it also can be extended easily to the processors available across your network.
With MP CONNECT, you perform multiprocessing with Version 8 of the SAS System by establishing a connection between a “parent” SAS session and one or more additional SAS sessions. Each of the sessions can then asynchronously execute tasks in parallel. You can continue processing with the parent session, query the status of any of the async tasks, and merge the asynchronous tasks back into your parent session at the appropriate time. You gain the ability to exploit MP/SMP hardware as well as network resources to perform parallel processing of self-contained tasks and easily coordinate the results into the parent SAS session.
The current MP CONNECT functionality is designed to address independent parallelism, which is possible when you have two or more tasks to execute and those tasks do not have interdependencies. An example of independent parallelism would be extraction and processing of data from multiple unique, and possibly remote, data sources. Another example would be a HOLAP scenario that requires the creation of multiple MDDBs.
The primary purpose of parallel processing is to reduce the time that it would take to execute the same job serially. “The performance gains can be amazing,” says John Bentley of First Union National Bank. “On a four-way UNIX box we cut the processing time of a 9-million-record data set from 46 minutes to 17 minutes. For another job run against a terabyte-class data warehouse, we cut the SQL processing time from just over an hour to 20 minutes. Even if it takes a few hours to modify and test a production program, you quickly recoup that time. Also, don’t overlook the fact that not only are you speeding up your jobs, but you’re also scaling up your server because it can handle more work in the same amount of time.”
Basic SAS/CONNECT Parallel Processing Template – SMP Machine
The following pseudo code sets up the global SASCMD option so that N number of SAS sessions can be spawned on the same multi-processor machine. Each SIGNON statement spawns a new SAS session. The %SYSLPUT statement can be used to set macro variables in the “remote” SAS sessions. The RSUBMIT statements must have the WAIT=NO option in order to cause the RSUBMITs to execute asynchronously. In addition, the LOG=/OUTPUT= options are used in this example to manage the output and results from each of the async tasks.
options sascmd="sas";
 
signon task1;
 
%syslput remvar1=somevalue;
 
rsubmit task1 wait=no log="task1.log" output="task1.lst";
 
/* send a task to be executed by TASK1 SAS session */
/* may make use of REMVAR1 macro variable in this */
/* SAS code */
 
endrsubmit;
 
signon task2;
 
%syslput remvar2=somevalue;
 
rsubmit task2 wait=no log="task2.log" output="task2.lst";
 
/* send a task to be executed by TASK2 SAS session */
/* may make use of REMVAR2 macro variable in this */
/* SAS code */
 
endrsubmit;
 
 
 
signon taskn;
 
%syslput remvarn=somevalue;
 
.
.
.
 
rsubmit taskn wait=no log="taskn.log" output="taskn.lst";
 
/* send a task to be executed by TASKn SAS session */
/* may make use of REMVARn macro variable in this */
/* SAS code */
 
endrsubmit;
 
 
waitfor _all_ task1 task2 ... taskn;
 
/* do some further local processing */
 
 
signoff task1;
signoff task2;
.
.
signoff taskn;
Read this SUGI paper with an example….

Some SAS/CONNECT Tips and Tricks to process your SAS Programs concurrently and share the SAS datasets between sessions…

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.