High-performance, Real-time Big Data Scoring for R Users
R has achieved mass acceptance as one of the most dynamic and capable programming languages for statistical computing and graphics, and as a result, has achieved widespread adoption among data scientists worldwide. Even when data scientists utilize this powerful software tool, they still confront the challenge of rapidly deploying predictive models from a development environment into an operating environment.
Both Zementis solutions support batch and real-time computing environments, offering a highly efficient, streamlined, versatile and cost-effective approach to executing R models.
ADAPA and UPPI utilize PMML, the Predictive Model Markup Language, which is the de facto standard for representing predictive models. PMML allows models to be developed in one application and deployed within another, as long as both applications are PMML-compliant. Since both Zementis solutions are compatible with R, R users benefit by gaining both extreme speed and extreme agility across the predictive model lifecycle.
From model development, to test, to deployment and to ongoing operation in a production environment, Zementis delivers more efficient data science to R users.
Immediate Benefits of Using ADAPA or UPPI
Once a model built in R is saved as a PMML file, it can be directly deployed in ADAPA or in UPPI for scoring in whatever production environment the business utilizes. With ADAPA and UPPI, you can:
- Execute your R models on any target platform without the need for an R-based deployment engine or special R enhancements
- Support a wide breadth of R modeling approaches (e.g. model ensembles, segmentation, chaining, cascade)
- Overcome memory and speed limitations that R imposes on model execution performance
- Support sub-second, real-time scoring
- Deploy your R models in minutes, not months (no need for recoding models into production)
- Make one or many predictive models operational at once
- Ensure consistent model performance by removing dependency on the R environment
With ADAPA, you can:
- Execute models on-demand, for individual real-time transactions or batch scoring applications
- Manage models via Web Services or a Web console
- Tap into all the advantages of cloud computing with ADAPA in the Cloud
With UPPI, you can:
- Execute your models in-database, close to where your data resides with UPPI In-database (Greenplum/Pivotal, IBM PureData for Analytics (Netezza), IBM z Systems mainframes, SAP Sybase IQ, Teradata and Teradata Aster)
- Tap into the advantages of big data infrastructure components and applications within the Hadoop ecosystem, using Hive, Spark and Storm for batch and real-time streaming analytics, as well as Datameer for Hive-based analytics for business intelligence
- Manage models as UDFs (User-Defined Functions) and write SQL against UDFs
R PMML Support
R offers support for PMML through the “pmml” and “pmmlTransformations” packages available in CRAN (Comprehensive R Archive Network). Zementis proudly contributes to and maintains both packages. These packages together allow for great flexibility to export your R models and transformations into PMML.
- PMML Package: The “pmml” package offers PMML export for a great number of model objects and packages. These include nnet, glm, random Forest, etc. For a comprehensive list of supported packages, click HERE. Zementis also offers proprietary PMML packages for ensemble models, enabling significant reductions in time required and memory used to convert those models to a PMML format.
- PMML Transformations: While the “pmml” package allows users to export a multitude of predictive models in PMML, the “pmmlTransformations” package allows for data transformations (e.g. data pre-processing) to be exported in conjunction with the model itself. To download a paper explaining all the details surrounding the “pmmlTransformations” package, click HERE.
- R Support Forums: Our support forums offer multiple examples of R commands that allow you to build a model in R and export the model object in PMML. Browse through our R PMML Support forums for details.
A Common Industry Standard
PMML allows for the decoupling of two very important data mining phases: model development and operational model deployment. With PMML, data scientists can focus on data analysis and model building using best-of-breed model development tools, whereas ADAPA and UPPI make operational deployment and actual use of the model extremely fast and simple.
For example, if a data scientist develops a Random Forest Model using the R random Forest package, effectively deploying the model operationally only requires saving it as a PMML file using the R pmml package and then deploying it either in ADAPA or UPPI for scoring.
Once deployed, the model is available for all to use. ADAPA makes models available for scoring on-premise through ADAPA On-site or in the cloud through ADAPA in the Cloud. UPPI makes models available for scoring in-database or in Hadoop.
PMML allows for the model development environment to be used just for that, model development. Scoring, in real-time or for batch jobs, is handled by ADAPA or UPPI.