Visual workflows for data science just make sense

Today many data analysts and data scientists use programming languages to work with data. They are using the tools of software engineers to also achieve the means of data work. Historically, this happened primarily because data scientists used to come out of data engineering (with fluency in SQL), computer science (with fluency in Python) or statistics (with fluency in R).

And for a long time, this worked just fine. Writing code can, and still is successfully used for accessing, cleaning, transforming, and analyzing data, and even for distributing data science solutions. But it’s no longer the best or the fastest way to empower people to work with data.

Software engineering typically focuses a lot on a principle called a “control flow.” This is represented by the program or the set of instructions that make software do what it does. If you have a smart home program, for example, the lights may turn on when you open the door. Or when the temperature drops below a specific temperature, the heat turns on, unless the window is open. The value of a program is in the instructions and the embedded logic. The result of the work of the programming team is the program they put together.

However, with data science, you’re more interested in generating insights from data by creating data summaries or models. You are much more concerned with what happens with the data along different stages of a process resulting in those insights and models. This simple shift in interest – from the instructions (as in a control flow) to the data and how it’s shaped by the process – already affects the utility of a code-based approach.

When you’re so concerned with what happens to data at each stage, it makes much more sense to want to see a data flow, with the added benefit of seeing the output after each discrete step of the process. This is precisely what KNIME visual workflows provide.

Here’s our distinct approach to visual workflows, and the three reasons why we believe they offer the best way to work with data.

1. We care about the method, not the code.

A data scientist needs to understand what a method or algorithm does, but not how it is actually doing it (i.e, how it is really implemented).

Imagine a data scientist tasked with building a model to predict customer churn for a telecommunications company. One of their concerns is choosing the appropriate model, such as logistic regression, a decision tree, or using random forests, and setting an optimization goal for that model to achieve the best predictive performance. But they couldn’t care less about the code underlying a given model training method. Data scientists rarely look at the code inside XGBoost or any other ML algorithm or library they regularly rely on.

Generally, data scientists care about the following:

Data understanding: the characteristics, quality, and structure of the data they're working with.
Feature engineering: selecting and creating relevant features from the data to improve model performance.
Model selection: Choosing the most appropriate algorithms and techniques for the given problem and data.
Model evaluation: Assessing the performance of the models using appropriate metrics and validation techniques.
Interpretability: Understanding how the model works and being able to explain its predictions to stakeholders.

They care much less about the intricate details of how specific techniques are implemented in code. Nor do they particularly enjoy figuring out how to interface with and between different tools and programming languages.

In other words, they care about the tools and knobs and dials of a model to generate insights or accurately predict the future – not about the details of the underlying code.

2. Data science demands collaboration.

Using visual workflows creates a common language between data experts, as well as between data and domain experts.

When collaborating within a team, a data engineer doesn’t need to go into the details of her SQL code to the AI engineer who is using Python or the visualization experts who prefer JavaScript. Adding in their expertise at just the right time in the data flow makes visual workflows an excellent collaboration tool between technical experts. And if the tool allows them to add their code (only if they want), even better.

The visual workflow paradigm is also extremely useful in working between data and domain expert teams. One of the most common complaints we hear about data teams is that they’re too far away from the data to understand the peculiarities and anomalies that are “common sense” to domain experts. Folding in domain expertise and getting feedback early about the solution is imperative to make sure the data science project doesn’t veer too far off course, and ultimately, deliver a flawed solution. Here, visual workflows can be used to align, explain, and even hand over data science solutions from one team to the other.

Similarly, a visual workflow is a handy reference for anyone working in governance and compliance, creating a visual record of what is done to potentially sensitive data. Or, to show them any safeguards your team may have created to better control how and which data is accessed by AI models.

3. Learning data science shouldn’t require a coding degree.

One of the biggest benefits to visual workflows is the learning curve. In a world that is increasingly saturated by data and AI, the onus falls on data experts to train the future workforce to fluently work with large datasets.

Visual workflows allow beginners to start with simpler data manipulation and automation, but immediately become familiar with a tool that can also be used to do advanced data science. Within hours they can build their first real workflow that summarizes spreadsheets, pulls data from a warehouse or even builds an ML model. By that time, they have already understood the visual workflow paradigm, and the next step is starting to understand more nodes and what the underlying methods do.

This way, they can add new expertise in incremental steps and dive deeper into the field of data science without ever having to leave the visual workflow environment. The KNIME user base is filled with upskilled marketers, supply chain analysts, chemists, production engineers, HR analysts, and even machine learning experts – who are regularly building complex analytical workflows without ever having to learn to code.

KNIME in the context of low-code

If you’re convinced that visual workflows make sense for data science, your next step is evaluating which low-code provider makes the most sense.

KNIME is distinct from other providers in three ways:

The workflow is the program. Low-code is a broad term, and many tools that look like KNIME aren’t necessarily doing the same thing under the hood. Many low-code providers are simply adding a UX on-top of a coding language like Python. You visually drag and drop nodes, and underneath the interface that action creates code. In some cases, you’d be expected to fiddle or adjust the code slightly to get it to work.

With KNIME, the visual workflow is the program. And the corresponding programming language is a network connecting nodes. The advantage here is that everything can be done without code, and KNIME does not rely on any one single language or library to stay relevant.
Open source (and free) to future proof your data function. KNIME Analytics Platform is completely free and open source. Free means anyone can build any number of workflows of any complexity completely for free. Only pay when you need to automate or deploy your workflows as REST APIs or data apps, using KNIME Hub. Open means that any new development in the market – and nowadays, there are very many – gets integrated into the platform quickly. Integrative also means you can still add in code if you want.
Analytic complexity. KNIME Analytics Platform provides the broadest set of functionality and analytic techniques in the low-code space, in part thanks to its open source approach.

With KNIME, you’re never forced to reach through to code to set parameters for the underlying libraries that get executed. All aspects that could be modified are available to the visual programmer, which, in many cases, means extremely advanced control. In KNIME, this works for everything from using databases through to neural networks and on to all other categories of data science such as text, image and process mining.

The visual paradigm is used for everything in KNIME – from prep and blend, to analysis and visualization, to creating packages for deployment, to calling external applications, to building interactive data apps, to capturing and storing metadata.

At KNIME, we believe visual workflows just make sense for data science. We’ve built a real (visual) programming language from scratch to help data workers from any background or expertise build data science solutions, end to end.

Visual workflows for data science just make sense.

1. We care about the method, not the code.

2. Data science demands collaboration.

3. Learning data science shouldn’t require a coding degree.

KNIME in the context of low-code

Get started