Contact Sales: (619) 330-0780

Real-time and Big Data Scoring for KNIME Users

Solutions_By-Platform_Knime-Users_Graphic

Zementis ADAPA® for real-time scoring and UPPI™ for big data scoring provide additional value to all your predictive assets. Both solutions are complimentary to KNIME and extend your modeling environment into the IT operational domain.

ADAPA and UPPI are compatible with KNIME due to their use of PMML, the Predictive Model Markup Language, which is the de facto standard for representing predictive models. PMML allows for models to be developed in one application and deployed within another, as long as both applications are PMML-compliant.
 

Immediate Benefits of Using ADAPA or UPPI

Once a model is exported from KNIME as a PMML file, it can be directly deployed in ADAPA for real-time or batch scoring or in UPPI for scoring in-database or via Hadoop.

With ADAPA and UPPI, you can:

  • Execute your models independently of KNIME
  • Deploy your models in minutes, not months (no need for recoding models into production)
  • Make one or many predictive models operational at once
  • Use multiple models to deploy a model ensemble, segmentation, chaining or cascading

With ADAPA for real-time scoring, you can:

  • Produce scores in real-time (using Web Services or Java API), on-demand, or batch-mode
  • Manage models via Web Services or a Web Console
  • Tap into all the advantages of cloud computing with ADAPA in the Cloud

With UPPI for big data scoring, you can:

  • Execute your models in-database, close to where your data resides with UPPI In-database (Greenplum/Pivotal, IBM Netezza, SAP Sybase IQ, Teradata and Teradata Aster)
  • Turn your models into UDFs (User-Defined Functions) and write SQL against UDFs
  • Execute your models in Hadoop with UPPI for Hive/Hadoop or Datameer

 

KNIME PMML Support

KNIME provides multiple tools to build various types of PMML models using a graphical workflow. In KNIME a PMML document is built parallel to the data flow and results in a model that documents all the preprocessing and learning steps performed on the data. By providing nodes to assemble PMML in a modular fashion, the model is built independently of the data flow and can be extended with PMML exclusive nodes, which for example modify metadata or add model verification data to the document.

KNIME provides the following features for PMML building:

  • Creation of documents according to the most recent PMML version 4.2.
  • Support for a variety of different model types. Examples of supported models include decision trees, support vector machines and neural networks.
  • All generated PMML documents adhere strictly to the PMML standard’s XML schema definition and therefore allow for easy exchange of models with other tools.
  • Elaborate nodes for creating ensembles of models. Ensembles of models can be built separately and pasted together later or learned in a loop and added to the ensemble successively.
  • Weights of the individual models can be determined by scoring them using the in-built scoring nodes.
  • Numerous nodes to add preprocessing steps to PMML. KNIME is one of few tools that allows to add data transformations to PMML as well.
  • Nodes to add and modify meta data.

 

Knime_modular_PMML

The workflow above shows how PMML is built parallel to the data flow. During preprocessing the performed transformations are collected and added to an empty PMML document. In the end the learned model is appended.

 

A Common Industry Standard

PMML allows for the de-coupling of two very important modeling phases: development and operational deployment. With PMML, data scientists can focus on data analysis and model building using best-of-breed model development tools, whereas operational deployment and actual use of the model is made extremely easy and simple with ADAPA and UPPI.

Solutions_By-Platform_Knime-Users_Graphic2_V01

For example, if a data scientist develops a Clustering Model using KNIME, all she needs to do to effectively deploy her model operationally is to save it as a PMML file and deploy it either in ADAPA for real-time scoring or in UPPI for big data scoring. Once deployed, the model is available for all uses: ADAPA makes her model available for scoring on-premise through ADAPA On-site or in the cloud through ADAPA in the Cloud. UPPI makes it available for scoring in-database (Greenplum/Pivotal, IBM Netezza, SAP Sybase IQ, Teradata and Teradata Aster) or Hadoop (through Hive or Datameer).

PMML allows for the model development environment to be used just for that, model development. Scoring, in real-time or for big data, is handled by ADAPA or UPPI.

Copyright © 2017 Zementis / Software AG. All Rights Reserved.