KNIME’s visual workflow paradigm is often branded as “low code”. But this term increasingly creates confusion, because, especially in the data science and AI space, any software that has a bit of UI on top of a programming language is called “low code”. While that’s fine – it is, after all, not “high code” if I can create parts of my code via a visual interface. It differs fundamentally from the way data scientists create workflows with tools like KNIME.
The key difference lies in the programming language.
There are two types of low code.
For the first category of low code tools, there is always a coding language underneath that’s created by the visual interface. In some tools you can continue to use the visual interface to refine the resulting code. In others, the visual interface is merely a wizard that creates code as a starting point. But in both cases, in order to fine tune details you will eventually need to reach out and touch the code. Some analysts seem to believe even notebooks belong to the low code category.
For the second category of tools – KNIME among them – the workflow is the program. And the corresponding programming language is a network connecting nodes – or, in more geeky terms, a directed acyclic graph, known as DAG.
This graph models how the data is handed from one operation to the next, and represents the data flow. The flow is often not just single-threaded and there can be side branches, but fundamentally this is a left to right kind of flow.
Each node will have an implementation process associated with it that consists of some code or a library call, but the programming language this code or library is written in doesn’t need to matter to the workflow creator. This means that a KNIME workflow can consist of nodes that, underneath the hood, rely on Java, Python, R, SQL, and even C code in some instances.
If you ask whether we can show the code that represents the KNIME workflow, we can – but it will be a set of files that represent the nodes, their connectivity, and their configuration. We could also show you the code underneath each of those nodes but this will then become quite wild: You might end up seeing code in Java, Python, C, SQL, and more. And which data science programmer regularly inspects the code of the libraries they call?
So in the first case, the low code aspect allows you to create the code faster and you end up programming in, say, Python. You just don’t write every line of code by hand, but you have visual tools that create some parts of that code for you. In many data science projects that will end up being a sequence of library calls to connect to data, transform it, analyze it, and so on.
In KNIME’s case, you program visually. You similarly create that sequence of operations that are applied to your data, but you continue to interact with this “program” on the workflow level.
Working visually makes it easier to work with side branches (in a sequential program seeing two parallel branches isn’t exactly easy). Working visually also makes it easier to configure because the nodes all have similar dialogs that expose all the needed parameters and include friendly documentation built-in.
This makes life quite a bit easier for a data scientist: Using visual workflows means that irrespective of the programming language the nodes are using underneath, you can directly control the parameters –you don’t need to remember the precise function signature for each and every library call.
Why is this “low code” and not “no code”?
Because with software like KNIME, a data scientist can reach out to code but this code is then executed within one of those “scripting” nodes itself. Sometimes it’s just useful to quickly write a couple of lines of Python, R, SQL, or JavaScript code.
This gives the user the freedom or choice to code when they want – maybe to call out to a brand new ML algorithm that’s not yet exposed as an independent node in KNIME or because you want to experiment with something and it’s just easier for you to do that in R, Python, Java, etc.
We strongly believe workflows are the appropriate programming language for data science for two reasons.
- Data centric programs focus on the flow of the data, not the control flow you typically see in other programs.
- Data scientists need to understand what a method or algorithm does but not how it is actually doing it (e.g. how it is really implemented). If you are a programmer: when was the last time you looked at the code inside the XGBoost or other cool ML algorithm library you regularly use?
But there are other benefits of this “the workflow is the program” view as well.When collaborating within a team, it’s much easier to explain the data flow logic since a data engineer doesn’t need to go into the details of her SQL code with an AI engineer who is using Python or the visualization guru who is an expert in JavaScript. They collaborate on an abstract workflow level and only worry about the details inside the node configuration of the workflow where they are experts.
But in our view, the biggest benefit is the learning curve.
Rather than having people first learn how to write code, data science beginners can start with a couple of simple nodes and within the first hours already build their first real workflow that summarizes a couple of spreadsheets or builds an ML model or accesses a database to pull some data.
By then, they have already understood the concept behind this visual programming paradigm and adding nodes to their repertoire is merely a matter of making sure they understand the important part: what does this method do?
They can add new expertise in small steps and continue building out their data science skill set without ever needing to leave the visual workflow paradigm.
If they come from a spreadsheet background they will continue to use spreadsheets and probably use KNIME initially for a lot of the automation tasks.
If they come from a R or Python statistics or machine learning and programming background, they may continue to use those languages inside a few nodes for a while until they find the built-in functionality in KNIME easier to use (and communicate to their non Python colleagues). And the data engineer will be happy to not just have a visual ETL tool, but to also be able to run analyses within the data wrangling framework.
Before you know it, super complex analytical workflows are being built in the same environment that people started with just a few months earlier.