In recent months a wealth of tools has appeared, which claim to automate all or parts of the data science cycle. Those tools often automate only a few phases of the cycle, have a tendency to consider just a small subset of available models, and are limited to relatively straightforward, simple data formats.
At KNIME we take a different stance: automation should not result in black boxes, hiding the interesting pieces from everyone; the modern data science environment should allow automation and interaction to be combined flexibly. If the data science team works on a well defined type of analysis scenario, then more automation may make sense. But more often than not, the interesting analysis scenarios are not that easy to control and a certain amount of interaction with the users is actually highly desirable.
We have already described the principles of Guided Analytics and how KNIME workflows very naturally support them (see blog post “Principles of Guided Analytics”) and briefly discussed how this way of creating analytical applications allows automation and interaction to be mixed & matched. Since then, we have put together a more comprehensive workflow, serving as a blueprint for anyone to build her or his own version of a Guided Analytics application to combine just the right amount of automation and interaction for a specific set of problems.
The workflow provides reusable pieces for data transformation and cleaning, feature selection and engineering, model optimization and selection and, at the end, even allows the user to download and inspect the resulting scoring workflow. The workflow is available on the KNIME Hub and the following video walks through the different steps and explains the underlying techniques.