KNIME logo
Contact usDownload
Read time: 5 min

9 data science superstars share tips & tricks

January 17, 2024
Data literacy
9-data-science-superstars-share-tips-and-tricks
Stacked TrianglesPanel BG

The data science superstars from the KNIME community are a special kind of superstar. They share their work, experience, and knowledge to support others.

The newest Just KNIME It! challenge winners are this month’s celebrated Contributors of the Month. The nine winners posted solutions to all 30 challenges from the last season and contributed hugely to the community by sharing efficient, well-documented, and creative workflows on the KNIME Forum.

The nine winners now share their data science tips from how to manage projects, to favorite data science techniques, becoming part of a community, go-tos for visualization hacks, and more.

1. Get involved in a data science community to learn faster

Heather Lambert, Cheminformatician: My tip for people who are learning data science is to be as active as possible with the new skill, language or software that they are trying to learn. Incorporate your learning into your daily routine, even if it’s just one hour a day.

Look for challenges that get you involved in a community, like the Just KNIME It! challenges did for myself and the other KNinjas. It definitely helps to be part of a community that can offer advice and support when learning something new!

2. Break down complex processes and unfamiliar techniques into manageable bites

Hiroki Yoshida,Senior scientist (medicinal chemistry & chemoinformatics): I believe that it’s important to attempt even complex processes or unfamiliar techniques and break them down into what I can and cannot do.

This approach helps me pinpoint exactly what I’m unable to accomplish or comprehend. Besides, I always make sure to pay attention to input and output formats, ensuring their accuracy and consistency.

3. Explore solution starters to jump start your projects

Ángel Molina, Consultant: My only advice, which I always emphasize, is community. What do I mean by community? Well, being a part of the KNIME community is the greatest value of KNIME, and it is something you should leverage if you want to excel in data science.

  • Go to the KNIME Forum: Ask anything you need, and some selfless KNIMErs will help you in any way they can.
  • Explore KNIME Community Hub: Find workflows related to what you are working on; some may be perfect for you or serve as examples.

4. Take the time to understand technical terms for smooth project execution

Naoyoshi Yamamoto, Pharmacological researcher: Be aware that the same terminology may have different meanings in different fields. When working on a data science project in collaboration with others in different fields of expertise, even if it seems tedious, taking time to confirm the meanings of technical terms with each other as much as possible can reduce the probability of unexpected problems occurring and maximize the time available for creative work.

5. Learn visualization fast with the Generic Echarts & engage with AI for queries

Luo Yuxi, Data Analyst: Do you want to learn visualization quickly? Visit the Examples - Apache ECharts page, choose some preferred diagrams, gather the data, employ the KNIME 5.2 ‘Generic ECharts View’ node, engage with the AI for queries, comprehend the configuration, and you’re done.

6. Learn web scraping to access untapped resources & enable more informed decision-making

Bertold Balázs, Managing Consultant: Web scraping is indispensable in data science. It extracts unstructured data from various sources, enriching your data, empowering better model training and unlocks valuable insights. By gathering public information, web scraping empowers data scientists to access a wealth of untapped resources, unlocking a deeper understanding of patterns and trends crucial for informed decision-making.

The good news is that you don’t need to leave KNIME for precise web scraping. KNIME offers nodes such as GET Request and Webpage Retriever to extract necessary information. Additionally, for more advanced web use cases, KNIME provides Selenium nodes or, if you’re comfortable with Python, the Python Script node to execute Selenium code seamlessly.

7. Try integrated deployment to save time & nerves

Artem Ryasik, Advanced Analytics Engineer: One thing I recommend to everyone is to learn about the integrated deployment extension. No matter what you do in a KNIME workflow, reproducibility and segmentation of its logical parts is extremely helpful with this extension and can save you a lot of time and nerves in the future with debugging, testing and deployment of the workflow. Another trick you can use with this extension is to chain the workflows one after another and call them within a master workflow. And if you are using KNIME Server or Business Hub, deploying workflows with integrated deployment nodes makes creating REST API as simple as possible.

8. Ensure your analytics tool enables easy integration of multi-source data

Anil Kumar Sharma, Purchase CPPD: In the intricate world of data analytics, the seamless integration of multi-source data is pivotal. A data analyst spends 70-80 % time in data wrangling and prep and the task becomes more complex when the data is unstructured, bulky and originates from various sources and types. 

KNIME, with its versatile nodes like Joiner, Concatenate, and VLOOKUP, offers a robust solution to this challenge. These nodes act as the backbone, enabling analysts to merge diverse datasets with precision.

The real magic unfolds with the implementation of the IF Switch node. This node, akin to a skilled conductor, orchestrates the data flow, deciding which path the data takes based on predefined conditions. This approach not only ensures efficient handling of various data types but also enhances the overall workflow, making it adaptable and error-resistant.

By leveraging these nodes in harmony, KNIME transforms complex data joining tasks into a streamlined, efficient process, unlocking insights that were once obscured by the confines of disparate data sources.

9. Clear visualizations and well-documented data analysis results is key

Ryushi Seo, Product Engineering Manager: What I would like to recommend to everyone is the reporting feature of KNIME. Visualizing and documenting the data analysis results from the KNIME workflow is extremely important for sharing analysis results and for future reference. With the new KNIME Reporting Extensions introduced in KNIME 5.1, you can easily output the composite views of the component as a PDF. In addition, if you need to create more advanced reports, you can use BIRT, which allows you to create not only PDFs but also PPTX files.

Learning from fellow data scientists

As a data scientist you're constantly updating your skill set to learn about the latest algorithms, get advice on techniques, and hints about best practices. Help and advice from the community are valuable as you build and improve data science projects.

Learn more about the data science superstars from the KNIME community in the KNIME Hall of Fame.