Predictive Analytics Platform
The Universal PMML Plug-in (UPPI) for Hadoop
The Zementis Universal PMML Plug-in (UPPI) for Hadoop delivers a common predictive analytics platform for the entire Hadoop ecosystem. UPPI cements an efficient process for instant operational deployment, addressing execution needs for batch processing, in-memory computation and streaming data.
- Delivers fast, massively parallel execution of advanced predictive models
- Enables seamless, vendor-neutral integration of existing machine learning algorithms
- Implements homogenous predictive analytics capabilities across the entire Hadoop ecosystem
UPPI for Hive
Hive is a data warehouse system for Hadoop, and is optimized for querying and managing large distributed data sets. It enables access to files stored either directly in Apache HDFS (Hadoop Distributed File System) or in other compatible data storage systems.
With Hive, organizations can readily analyze large data sets stored in Hadoop-compatible systems. Since it provides a mechanism to project structure onto the data, Hive allows users to make queries using a SQL-like language called HiveQL.
Once deployed in the Zementis Universal PMML Plug-in (UPPI) for Hive, predictive models turn into UDFs (User-defined Functions). These can then be invoked directly in HiveQL. In this way, UPPI offers Hadoop users the best combination of open standards and scalability for the application of predictive analytics.
UPPI for Hive delivers instant and scalable scoring for big data while retaining compatibility with most major data mining tools through the PMML Standard. It brings the scalability of Hadoop to the execution of predictive analytics with the option to use Map Reduce, Tez or Spark as the underlying execution engine.
UPPI for Spark
Spark (TM) is a general cluster computing engine for large-scale data processing. For certain applications, its in-memory processing capabilities provide a much higher performance than the traditional, disk-based Map Reduce paradigm. Spark is highly applicable for machine learning and advanced predictive analytics.
With the Zementis Universal PMML Plug-in (UPPI) for Spark, PMML-based predictive models can easily be integrated into Spark Streaming. It ingests data in mini-batches and applies predictive models which were originally built in other data mining tools, e.g., R, SPSS or SAS, on those mini-batches. This design enables the same set of application code written for batch analytics to be used in streaming analytics, on a single engine. UPPI for Hive is also able to leverage the Spark execution engine as an third option in addition to Map Reduce and Tez.
UPPI for Storm
Storm is a distributed realtime computation system and designed for reliably processing streaming data. Its vision is to do for real-time processing what Hadoop Map Reduce did for batch processing. Apache Storm is highly scalable and applies to a broad set of use cases, but it especially shines in real-time processing for advanced predictive analytics and machine learning models.
The Zementis Universal PMML Plug-in (UPPI) for Storm offers users a unique combination of open standards and scalability for the application of predictive analytics. With the Predictive Model Markup Language (PMML) industry standard as the bridge between the model development environment and a distributed realtime computation infrastructure, UPPI for Storm offers standards-based deployment of predictive models and execution on a highly scalable platform. UPPI seamlessly incorporates the the power of advanced predictive models into the Storm infrastructure to deliver superior performance for mission-critical analytics solutions. Practically, PMML becomes a storm bolt/trident function offering execution performance that can meet the volume and performance requirements of the most demanding environments.
UPPI for Hadoop is certified on:
- Amazon EMR
- Cloudera (CDH: Cloudera’s Distribution including Apache Hadoop)
- Hortonworks (HDP2: Hortonworks Data Platform powered by Apache Hadoop)
- IBM Infosphere BigInsights
Contact us today to learn more about UPPI for Hadoop.