Place to learn SAS and enhance your professional career...
Jan 7, 2011
SAS/CONNECT Parallel Processing on a SAS SMP Machine
The driving strength of the SAS System has always been the ability to analyze huge amounts of data and turn it into information — information that can be used to make good, profitable decisions. However, SAS is a single-threaded application that, for the most part, can utilize only a single processor at a time.
MP CONNECT enables parallel processing with SAS Version 8, making use of idle processors to reduce the time required to process huge amounts of data. It gives users the ability to run the SAS System in parallel, multiplying the analytic capabilities of SAS for more timely business decisions. In addition, MP CONNECT not only allows parallel processing with SAS to better utilize the processing power of stand-alone multiprocessor servers, but it also can be extended easily to the processors available across your network.
With MP CONNECT, you perform multiprocessing with Version 8 of the SAS System by establishing a connection between a “parent” SAS session and one or more additional SAS sessions. Each of the sessions can then asynchronously execute tasks in parallel. You can continue processing with the parent session, query the status of any of the async tasks, and merge the asynchronous tasks back into your parent session at the appropriate time. You gain the ability to exploit MP/SMP hardware as well as network resources to perform parallel processing of self-contained tasks and easily coordinate the results into the parent SAS session.
The current MP CONNECT functionality is designed to address independent parallelism, which is possible when you have two or more tasks to execute and those tasks do not have interdependencies. An example of independent parallelism would be extraction and processing of data from multiple unique, and possibly remote, data sources. Another example would be a HOLAP scenario that requires the creation of multiple MDDBs.
The primary purpose of parallel processing is to reduce the time that it would take to execute the same job serially. “The performance gains can be amazing,” says John Bentley of First Union National Bank. “On a four-way UNIX box we cut the processing time of a 9-million-record data set from 46 minutes to 17 minutes. For another job run against a terabyte-class data warehouse, we cut the SQL processing time from just over an hour to 20 minutes. Even if it takes a few hours to modify and test a production program, you quickly recoup that time. Also, don’t overlook the fact that not only are you speeding up your jobs, but you’re also scaling up your server because it can handle more work in the same amount of time.”
The following pseudo code sets up the global SASCMD option so that N number of SAS sessions can be spawned on the same multi-processor machine. Each SIGNON statement spawns a new SAS session. The %SYSLPUT statement can be used to set macro variables in the “remote” SAS sessions. The RSUBMIT statements must have the WAIT=NO option in order to cause the RSUBMITs to execute asynchronously. In addition, the LOG=/OUTPUT= options are used in this example to manage the output and results from each of the async tasks.