KNIME logo
Contact usDownload

Just KNIME It!

Angle PatternPanel BG

Rick & Morty reference and KNIME nodes

Challenge 4: Rick and Morty Catalog

Level: Medium

Description: You work for an entertainment company that wants to analyze various TV shows to gain insights into audience preferences. Your next project focuses on transforming data from the show “Rick and Morty” into a structured dataset that can be easily analyzed and queried. As a first step, you need to extract a complete list of characters featured in the series. However, the API that provides this information is paginated, returning separate JSON files for each page. How can you use KNIME to retrieve and merge all these pages into one unified catalog of the show’s characters? To solve this challenge, use the https://rickandmortyapi.com/api url.

Beginner-friendly objective: 1. Successfully send a Get Request to receive the characters data from the API.

Intermediate-friendly objectives: 1. Implement a recursive loop to handle paginated API responses and ensure all data is collected. 2. Unify all the pages into a table with two string columns, id and character name.

Author: Babak Khalilvandian

Remember to upload your solution with tag JKISeason4-4 to your public space on KNIME Community Hub. To increase the visibility of your solution, also post it to this challenge thread on KNIME Forum.

We will post our solution to this challenge here next Tuesday.
Go to the KNIME Forum

Previous challenges

Level: Medium

Description: 
You are a data scientist working for a grocery store that focuses on wellness and health. One of your first tasks in your new job is to go over the grocery's inventory and find patterns in the items they sell, based on nutritional composition. This will help them assess if they need to tweak their offerings, and where, to match their ethos of wellness and health.

Beginner-friendly objectives: 1. Load and normalize the grocery data. 2. Cluster the data based on its numeric values using an unsupervised learning algorithm such as k-Means. 3. Denormalize the data after clustering it.

Intermediate-friendly objectives: 1. Visualize the clustering results using scatter plots and analyze the distribution of clusters. Use flow variables to dynamically control the scatterplot and enhance interactivity. 2. Perform dimensionality reduction using PCA to simplify the dataset while retaining essential information. 3. Visualize the results with scatterplots as well.

What patterns can you find? What recommendations and insights can you come up with based on these patterns?

Author: Aline Bessa

Dataset: Groceries Dataset in KNIME Community Hub

Solution Summary: 
The solution involves clustering the normalized data to find data groupings based on nutritional attributes. We then create two components for the visualization of results: one uses an important dimensionality reduction technique named PCA to project the data onto two dimensions of high variance, and the other implements an interactive scatterplot for users to check the clustered data using different nutritional attributes as axes.

Solution Details: 
The workflow starts with the CSV Reader node configured to read grocery data from a file named "food.csv”. The Normalizer (PMML) node is used to apply Min-Max normalization to all numeric columns, scaling them between 0.0 and 1.0. Next, the k-Means node clusters the data into three groups using nutritional attributes, with centroids initialized from the first rows. The data is then denormalized to facilitate visualization and interpretation. In one  component, the PCA node reduces the data to two dimensions of very high variance, retaining the original columns in the output. The Column Filter node retains only the PCA dimensions and cluster information for visualization, and an interactive scatter plot is created using the Scatter Plot (JavaScript) node, configured to display PCA results and clustering outcomes. In a second component, Single Selection Widget nodes allow users to pick two different nutrients to work as axes in a scatterplot of the data points, which are plotted in their assigned cluster color. The final steps of both components involve sorting and sampling the data to provide insights into the grocery items, with results displayed in Table View nodes for easy exploration.

See our Solution in KNIME Community Hub

Level: Easy
Description: You are a linguist studying linguistic diversity around the world. You have found a dataset that includes information about countries, such as the number of languages spoken, area, and population. The dataset also contains a column called MGS, which refers to the mean growing season in each country (i.e., for how many months per year crops can be grown on average). What are the top 5 countries by the number of languages spoken? What are the top 5 countries by the ratio of languages spoken to population? What are the top 5 countries by the ratio of languages spoken to land area? Finally, do you notice any patterns between the numbers of languages spoken and the MGS values?

Objective 1 (Easy): Learn how to import a CSV file into KNIME.
Objective 2 (Easy): Perform ratio calculations between columns (e.g., number of languages spoken and population size ratio).
Objective 3 (Easy): Sort the resulting table using specific criteria to select top 5 countries.
Objective 4 (Easy): Filter the top rows based on your selected criteria.

Author: Michele Bassanelli

Dataset: Linguistic in the KNIME Community Hub

Solution Summary: We solve this challenge with by computing the ratios between the number of languages spoken in a country and its population and area, and then ranking the countries.

Solution Details: After reading the linguistic dataset with the CSV Reader node, we answer the first question using the Top K Row Filter node, sorting by the "Lang" column. For the second question, an Expression node is used to calculate the ratio of languages to population, followed by another Top K Row Filter node to sort by the newly calculated ratio.
The third question is addressed with a similar approach, but the ratio is calculated between the number of languages spoken and the country’s area.
This challenge was adapted from Statistics for Linguists and uses a modified version of the dataset from Nettle 1999. In this case, the columns that were initially log-transformed are restored to their original values.

See our Solution in KNIME Community Hub

Level: Medium

Description: You have an EV and want to live in a place that has many available charging stations, and where it is also cheap to charge your vehicle. Given a dataset on chargers around the world, you need to find out the top ten cities that have the most EV chargers. You also want to consider which of those ten cities offer, on average, the cheapest KwH rates in cost. You should be narrowing down your city of choice to five after taking into account the costs.

Objective 1 (Easy): Clean data by removing addresses without real city names and extract country out of their addresses.
Objective 2 (Easy): Count the total number of EV charging stations by city and find the top ten cities.
Objective 3 (Easy): Of the top ten cities, find out which cities have the cheapest average cost to charge per kwH and show the five cheapest cities.
Objective 4 (Medium): Create a bar chart that allows you to compare the top ten cities in terms of average cost to charge per kwH. Create a widget that lets you select the cities you want to see in this plot and control the plotting with flow variables.

Author: Thor Landstrom

Dataset: EV data in the KNIME Community Hub

Solution Summary: We solve this problem by grouping the data by city, so that we can count every city's unique EV stations and also calculate their average cost. We sort and filter the data and create visualizations that allow users to compare cities' average EV prices interactively.

Solution Details: After reading the dataset with the CSV Reader node, we preprocess the data. First, we remove rows without proper addresses (Row Filter node) and then extract the addresses in the remaining rows for grouping (Expression node). The next step is to group the data by address with the GroupBy node, and sort the resulting data in descending order by count (Sorter node). We then extract the top 10 cities with the most EV charging stations with the Row Filter node. We use the Sorter and the Row Filter nodes again to extract the top 5 cheapest cities out of these 10, and visualize their average costs with the Table View node. In parallel, we create a component that allows users to compare the top 10 cities with the most EV charging stations in terms of average charging cost. This component has a widget that lets users select the cities they want to see in this plot, which turns into a flow variable that controls the plotting.

See our Solution in KNIME Community Hub

Enjoying our challenges?

They are a great way of preparing for our certifications.

Explore Certification Program

Just KNIME It! Leaderboard

KNIME community members are working hard to solve the latest "Just KNIME It!" challenge - and some of you have solved dozens of them already! Who are the KNIME KNinjas who have completed the most challenges? Click over to the leaderboard on the KNIME Forum to find out! How many challenges have you solved?

Sign up for reminder emails

 

*KNIME uses the information you provide to share relevant content and product updates and to better understand our community. You may unsubscribe from these emails at any time.

Previous Just KNIME It! Challenges