A few weeks ago, I found myself in a conversation that led to an interesting question. It went something like this:
Say, Hans, you’re actually quite active in the field of data science. But when you were studying, data science hadn’t even been invented yet. Still, over the past few years, you’ve gained knowledge and skills in data analytics and data science. So, you must know a thing or two about it. That’s why I have a question for you. What would you recommend if someone wants to learn data science themselves and they have to start from scratch?
Whew, I didn’t have an immediate answer to this question. But it did make me think. What would my advice be? It wasn’t a complex question, but one that triggered many different thoughts and ideas in my mind. How do you go about “learning data science from scratch?” As I pondered, I came up with a list of six tips. And I’m happy to share those tips with you here.
Tip 1: Don’t start with programming, opt for a low-code solution
It might look super cool, typing all sorts of complex Python statements in dark mode, but that’s not what makes me happy. My preference lies with a low-code solution like KNIME. With the help of KNIME, I’ve been able to accelerate my data science career tremendously. In KNIME, the process is central, not the code. And that comes with several advantages. A KNIME canvas is organized, allowing you to zoom in and out. You can encapsulate nodes into so-called metanodes, and with annotations, the (data science) process you execute in your workflow is supported and made understandable, turning it into a story that you can share and communicate.
Especially if you’re new to data science and don’t have a background or training in programming, it’s important to focus on the problem you want to solve or the insight you want to gain. How do you translate the business problem into a data science problem? That, to me, is the challenge in data science. As a data scientist, you need to be able to focus on the choices you need to make to arrive at a good solution. Which algorithm fits best, which records do I include, which variables should be considered, what metrics do I use to assess the quality of my solution? Those kinds of things. And as a beginner, you don’t want to get stuck every time because you misplaced a comma in your code or forgot a parenthesis, etc.
An additional advantage of KNIME is that during node configuration, all options are presented. Many choices are configurable but also have a default value. This allows you to configure each node thoughtfully or simply see what happens with the default values.
The added value of a data scientist doesn’t lie in writing good code (that’s what an LLM will do for you later) but in conceptualizing, implementing, and making choices to arrive at a process that turns input data into valuable output data. But to have a good plan and make the right choices, you do need some knowledge, but how do you acquire that knowledge?
Tip 2: Get started, just do it
You can read books, watch YouTube videos, browse blogs, take an online course, but if you only consume that content, your skills won’t improve, and your knowledge will only increase to a limited extent. To truly make progress in data science, it’s best to just start and build up knowledge around the data science activities you are doing. Get to work.
Suppose you want to learn how to create a predictive model, and you realize that you need to split your dataset into a training, testing (and validation) set. Dive into this topic and try to figure out the best way to split your dataset for your specific use case. Once you’ve set up partitioning to your satisfaction, move on to the next step in the process. You don’t need to know all the options, but it’s important to understand what you’re doing (and why). Build your workflow or code in small and manageable steps. Try to create a minimal viable product with as few nodes or lines as possible.
It is clear. I go for KNIME as my environment to do my data science and analytics projects. But the choice is based on personal preferences. And this choice is not the key for success to learn data science. Regardless of the approach chosen, the most important factor is consistent practice and hands-on experience in solving real-world data science problems. And yes, in my opinion, KNIME facilitates that the best.
Tip 3: Define a real-world use case with a familiar dataset
Overall, hands-on practice with real-world projects is a fundamental step in the learning journey of data science from scratch. It provides you with practical experience, fosters critical thinking and problem-solving skills, and builds a strong foundation for further exploration and growth in the field of analytics and data science.
If you want to gradually master data science skills, the choice of topic, use case, and datasets is important. It’s better to choose a use case and dataset you’re familiar with than standard datasets (like the Iris dataset) often seen in tutorials. If you don’t have a dataset at hand, check out the Kaggle Open Datasets.
Working on a topic you’re familiar with and a real dataset associated with it helps evaluate the outcomes of your steps accurately. For instance, if your predictive model for football match outcomes predicts a draw in 80% of the matches, you, as a football expert, know this is incorrect (on average, 25% of matches end in a draw). That means going back to the drawing board. Or if you encounter outliers, such as a team scoring more than 15 goals in a match, you can use your domain knowledge of football to decide if this might be incorrectly entered data or a valid value. Therefore, it’s recommended to work with a real dataset because these datasets confront you with deviations and noise that require attention. On the other hand, the advantage of working with “pre-existing datasets” like the wine dataset, the Iris dataset, or the Boston housing dataset is that they yield consistent results and sometimes seem too good to be true. You can use them effectively to get your workflow “working.” However, you’re not challenged to think about the outcomes.
But problem solving skills are also a part of data science. There for you have to approach problems analytically, question assumptions, and think creatively to find innovative solutions and stimulate critical thinking and decision-making abilities.
Tip 4: Take small, manageable steps
A data science use case like creating a predictive model can be accomplished with a limited number of nodes (see figure).
You probably won’t have the best model right away, but you’ll have a workflow that you can improve by adding functionalities (KNIME nodes) simply and step by step. Pause with each node addition to consider how to best configure it. Do I accept the default settings, or do I investigate the effect of deviating from the standard settings? Expanding the workflow provides opportunities to seek information by reading a blog on the topic, following a YouTube tutorial, or maybe taking a short training session, all specifically focused on the subject you’re currently working on in your workflow and want to learn more about. Reflection allows you to assess your growth, identify areas for improvement, and track your journey in mastering data science.
Tip 5: When stuck, don’t panic
One of the beautiful aspects of working on a data science use case is that it’s not a straight line to the finish. I often feel there’s always room for improvement or doing things differently. This means lots of testing and experimenting to arrive at a good, acceptable solution. However, reaching that good solution often involves overcoming various obstacles. It’s good to know that help is always nearby. If you search smartly on the internet, someone has likely found a solution to the problem you’re facing. And if you get stuck in KNIME, there’s the KNIME Forum, KNIME videos, and the KNIME Learning Centre.
But perhaps most importantly, don’t give up; keep trying. It won’t always be easy. It’s an illusion to become a full-fledged data scientist in a week. Learning new things happens in steps, and it’s faster when you combine practice with theory. But do it in moderation. It’s better to spend one hour a day for 8 days learning something new than trying to do it all in one day for 8 hours.
Tip 6: Stay motivated and curious, keep on learning
Becoming more proficient in data science doesn’t happen overnight. It takes time. Additionally, data science is more than just programming. It requires knowledge of methods and techniques, as well as the domain in which the data science use case operates.
Seek collaboration and networking within the data science community. Participate in forums, attend meetups, and connect with peers and professionals in the field. Collaborating with others can provide you with valuable insights, feedback, and opportunities for growth.
Your learning journey never ends. Try to stay updated with the latest trends, tools, and technologies in data science. Explore new areas, take advanced courses, and participate in workshops or conferences to expand your knowledge and expertise.
I will never tell you that it is easy to learn data science from scratch. Sometimes it’s easy, sometimes you get stuck. And occasionally your project will fail completely. Therefore, embrace failure as part of the process and let it fuel your motivation to keep learning and growing.
Conclusion
Starting your journey to learn data science from scratch? Here are six tips to guide you along the way.
1. Start with a low-code solution like KNIME to ease into the field without getting bogged down in programming syntax.
2. Dive into hands-on projects, applying your knowledge to real-world datasets and problems.
3. Choose familiar use cases and datasets to better understand the outcomes of your analysis and hone your problem-solving skills.
4. Take small, manageable steps, building on your skills iteratively as you progress.
5. Don’t panic when you encounter obstacles; seek help, stay persistent, and keep trying.
6. Embrace failure, stay motivated, curious, and committed to lifelong learning, continuously expanding your knowledge and staying updated with the latest trends and technologies in data science.
Ready to dive into data science? Follow these six tips, take action, and unleash your potential in the field. Your data science adventure awaits!
This blogpost was inspired by my contribution to a KNIME Webinar “How to teach yourself data science from scratch”.