The benefits of using predictive analytics is now a given. In addition, the data scientist who does that is highly regarded but our daily work is full of contrasts.
On the one hand, you can work with data, tools, and techniques to really dive in and understand data and what it can do for you.
On the other, there is usually quite a bit of administrative work around accessing data, massaging data and then putting that new insight into production - and keeping it there.
Surveys say that at least 80% of any data science project is associated with those administrative tasks. One popular urban legend says that, within a commercial organization trying to leverage analytics, the full time job of one data scientist can be described as building and maintaining a maximum of four (yes 4) models in production - regardless of the brilliance of the toolset used.
There is a desperate need to automate and scale the modelling process, not just because it would be good for business (after all, if you could use 29000 models instead of just 4, you would want to!) but also because otherwise we data scientists are in for a tedious life.
The KNIME Model Process Factory: For efficient running & monitoring of model processes
The KNIME Model Process Factory is designed to provide you with a flexible, extensible, and scalable application to efficiently run and monitor very large numbers of model processes.
The KNIME Model Factory includes a white paper, an overall workflow, which includes tables that manage all activates and a series of workflows, examples and data for learning to use the model factory.
The video shows the model factory in action: Here, you can see the orchestrating workflow triggering dependent workflows during execution.
Highlights of the KNIME Model Factory include:
- Workflow Orchestration. A workflow acts as the art director of the whole process, by organizing, monitoring, triggering, and automating – that is by orchestrating - all workflows involved in the model process factory.
- Model Monitoring. The KNIME Model Factory includes a number of workflows for initializing, loading, transforming, modeling, scoring, evaluating, deploying, monitoring, and retraining data analytics models.
- Reuse Best Practices. The workflows and the whitepaper also show the common best practice for packaging sub-workflows for quick, controlled, and safe reuse by other workflows
- Call Remote Workflows. The whole orchestration factory relies heavily on calling remote workflows; that is on the Call Remote Workflow node.
- Trigger Model Retraining. An important part of model monitoring is to know when exactly to start the retraining procedure. A few workflows in the KNIME Model Process Factory are dedicated to check whether model performance has fallen below a specified accuracy threshold and to retrigger its retraining, if needed.
- Full Working Examples. As usual, we provide full working example workflows - including data - to show how to handle typical modelling process tasks and conditions.
KNIME Model Factory resources
Anyone using KNIME can take advantage of the KNIME Model Factory. It is available on the KNIME Community Hub and runs on KNIME Analytics Platform, which means it is open source and free. Major benefits can be realized in terms of automation and interfacing by using the KNIME Model Factory with KNIME Business Hub.
- Download the KNIME Model Factory workflow here
- Watch the full presentation "The KNIME Model Factory: Scaling Modeling Processes for the Enterprise" by Iris Adae (KNIME) and Phil Winters (KNIME)
- Read the whitepaper, The KNIME Model Factory: Scaling Modeling Processes for the Enterprise.