Today many data analysts and data scientists use programming languages to work with data. They are using the tools of software engineers to also achieve the means of data work. Historically, this happened primarily because data scientists used to come out of data engineering (with fluency in SQL), computer science (with fluency in Python) or statistics (with fluency in R).
And for a long time, this worked just fine. Writing code can, and still is successfully used for accessing, cleaning, transforming, and analyzing data, and even for distributing data science solutions. But it’s no longer the best or the fastest way to empower people to work with data.
Software engineering typically focuses a lot on a principle called a “control flow.” This is represented by the program or the set of instructions that make software do what it does. If you have a smart home program, for example, the lights may turn on when you open the door. Or when the temperature drops below a specific temperature, the heat turns on, unless the window is open. The value of a program is in the instructions and the embedded logic. The result of the work of the programming team is the program they put together.
However, with data science, you’re more interested in generating insights from data by creating data summaries or models. You are much more concerned with what happens with the data along different stages of a process resulting in those insights and models. This simple shift in interest – from the instructions (as in a control flow) to the data and how it’s shaped by the process – already affects the utility of a code-based approach.
When you’re so concerned with what happens to data at each stage, it makes much more sense to want to see a data flow, with the added benefit of seeing the output after each discrete step of the process. This is precisely what KNIME visual workflows provide.
Here’s our distinct approach to visual workflows, and the three reasons why we believe they offer the best way to work with data.
A data scientist needs to understand what a method or algorithm does, but not how it is actually doing it (i.e, how it is really implemented).
Imagine a data scientist tasked with building a model to predict customer churn for a telecommunications company. One of their concerns is choosing the appropriate model, such as logistic regression, a decision tree, or using random forests, and setting an optimization goal for that model to achieve the best predictive performance. But they couldn’t care less about the code underlying a given model training method. Data scientists rarely look at the code inside XGBoost or any other ML algorithm or library they regularly rely on.
Generally, data scientists care about the following:
They care much less about the intricate details of how specific techniques are implemented in code. Nor do they particularly enjoy figuring out how to interface with and between different tools and programming languages.
In other words, they care about the tools and knobs and dials of a model to generate insights or accurately predict the future – not about the details of the underlying code.
Using visual workflows creates a common language between data experts, as well as between data and domain experts.
When collaborating within a team, a data engineer doesn’t need to go into the details of her SQL code to the AI engineer who is using Python or the visualization experts who prefer JavaScript. Adding in their expertise at just the right time in the data flow makes visual workflows an excellent collaboration tool between technical experts. And if the tool allows them to add their code (only if they want), even better.
The visual workflow paradigm is also extremely useful in working between data and domain expert teams. One of the most common complaints we hear about data teams is that they’re too far away from the data to understand the peculiarities and anomalies that are “common sense” to domain experts. Folding in domain expertise and getting feedback early about the solution is imperative to make sure the data science project doesn’t veer too far off course, and ultimately, deliver a flawed solution. Here, visual workflows can be used to align, explain, and even hand over data science solutions from one team to the other.
Similarly, a visual workflow is a handy reference for anyone working in governance and compliance, creating a visual record of what is done to potentially sensitive data. Or, to show them any safeguards your team may have created to better control how and which data is accessed by AI models.
One of the biggest benefits to visual workflows is the learning curve. In a world that is increasingly saturated by data and AI, the onus falls on data experts to train the future workforce to fluently work with large datasets.
Visual workflows allow beginners to start with simpler data manipulation and automation, but immediately become familiar with a tool that can also be used to do advanced data science. Within hours they can build their first real workflow that summarizes spreadsheets, pulls data from a warehouse or even builds an ML model. By that time, they have already understood the visual workflow paradigm, and the next step is starting to understand more nodes and what the underlying methods do.
This way, they can add new expertise in incremental steps and dive deeper into the field of data science without ever having to leave the visual workflow environment. The KNIME user base is filled with upskilled marketers, supply chain analysts, chemists, production engineers, HR analysts, and even machine learning experts – who are regularly building complex analytical workflows without ever having to learn to code.
If you’re convinced that visual workflows make sense for data science, your next step is evaluating which low-code provider makes the most sense.
KNIME is distinct from other providers in three ways:
The visual paradigm is used for everything in KNIME – from prep and blend, to analysis and visualization, to creating packages for deployment, to calling external applications, to building interactive data apps, to capturing and storing metadata.
At KNIME, we believe visual workflows just make sense for data science. We’ve built a real (visual) programming language from scratch to help data workers from any background or expertise build data science solutions, end to end.