KNIME logo
Contact usDownload

Just KNIME It!

Angle PatternPanel BG

Hand selecting images in a grid

Image Resizer and Format Converter

Challenge 7

Level: Medium

Description: You work as a freelance photo reporter for wildlife magazines. In your daily work you take a lot of pictures, usually in .JPG format and in different sizes. To be able to sell your photographs to magazines, you need to accommodate their different sizing and formatting requests. To streamline this process, you decide to build a workflow that automates the following, sequentially: (1) Image resizing -- create a configurable component with three options: do nothing, reduce to fixed size (150x150), or reduce size keeping ratio; (2) image format conversion -- create a configurable component with two options: .PNG or .SVG; (3) save edited images on your machine.

Author: Roberto Cadili

Datasets: Image Data in the KNIME Community Hub

Remember to upload your solution with tag JKISeason3-7 to your public space on KNIME Community Hub. To increase the visibility of your solution, also post it to this challenge thread on KNIME Forum.


We will post our solution to this challenge here next Tuesday.
Go to the KNIME Forum

Previous Challenges

Level: Medium

Description: 
As the 2024 European Football Championship (UEFA) unfolds, let's dive into football history with a data challenge. Today you are asked to create a data app that allows users to check, for any timeframe, what the top three teams with the most football victories were. Who are the top three teams of all time? And who were the top three teams in the 1980s?

Author:
Michele Bassa

Datasets: 
Football Data in the KNIME Community Hub

Solution Summary:
After reading the football data and determining wins, losses, and ties, we create a data app that allows users to pick a temporal interval and then check which three teams had the most victories.

Solution Details:
We start our solution by reading the football data with the CSV Reader node, transforming dates into Date format in the node's Transformation tab. Next, we use the Rule Engine node to determine wins, losses, and ties for home teams. This data is then sent to a component (data app) that allows for the temporal filtering of the data. Two instances of the Date&Time Widget node let users select the start and end dates of a temporal period, for which a team ranking will be calculated. The selected dates are passed to two instances of the Date&Time-based Row Filter node, reducing the data to a specific period. After that, two parallel branches use the Row Filter, Column Filter, and GroupBy nodes to select those matches in which the home team (top branch) or away team (bottom branch) wins. Both victory numbers are combined with the Joiner node, and then the Top k Row Filter node selects the top three best teams for the selected period. This information is then plotted with the Bar Chart node.

See our Solution in KNIME Community Hub

Level: Medium

Description: 
As a member of a think tank, your task is to craft a report on LGBTQIA+ representation in political discourse. Given a EU dataset gathering responses from LGBTQIA+ individuals across all member states, you decide to start your work by  investigating the answers to the following question: "In your opinion, how widespread is offensive language about lesbian, gay, bisexual, and/or transgender people by politicians in the country where you live?”.

Use a map to present the results effectively.

Author:
Michele Bassa

Datasets: 
LGBTQIA+ Survey Data in the KNIME Community Hub

Solution Summary:
To tackle this challenge, we reduce the scope of the data to question "In your opinion, how widespread is offensive language about lesbian, gay, bisexual and/or transgender people by politicians in the country where you live?". We then filter the answers and only keep the most common ones: "rare" and "widespread". This facilitates the understanding of trends and patterns across countries. We compute the percentages of answer "widespread" for every country and also compute their map coordinates. Finally, we join the geospatial information and the computed percentages and plot them in a map.

Solution Details:
After reading the survey dataset with the CSV Reader node, we prepare the data by reducing it to question "In your opinion, how widespread is offensive language about lesbian, gay, bisexual and/or transgender people by politicians in the country where you live?", and to its two most common answers, "rare" and "widespread". We also group the data by country, keeping the totals for both answers. We loop over this data (Group Loop Start and Loop End nodes) to compute the percentages of answer "widespread" for every country, using the Math Formula node (we compute the denominator for these percentages with the Moving Aggregator node). Next, we run another loop (Table Row to Variable Loop Start and Loop End nodes) to find the map coordinates of each country with the OSM Boundary Map node. We join the previously calculated percentages to the data with the map coordinates (Joiner node), use the Projection node to improve formatting for visualization, filter irrelevant data with the Row Filter node, and then finally plot the computed information with the Geospatial View node.

See our Solution in KNIME Community Hub

Level: Medium

Description: You work for the United Nations and want to discuss how the causes of death vary across the European Union (EU). You know how to analyze data and generate insightful visualizations, but the data you have at hand is a bit challenging: the meaning of its different columns and codes is not clear. To conclude your work well, you will  have to integrate this data with some metadata in XML format, making sense of the different death causes and data attributes. What patterns can you find in the different countries?

Author: Emilio Silvestri  

Datasets: Demographic Data from the EU in the KNIME Community Hub

Solution Summary:
Our solution to this challenge can be split into two steps. First, we identify the code for the top cause of death in each country, regardless of sex or age; next, we match these codes with metadata describing what they are and sort the countries based on these descriptions. For 27 (out of 35) countries, "diseases of the circulatory system" is the main cause of death; for 8 (out of 35) countries, the top cause of death is "neoplasms".

Solution Details: With the CSV Reader node, we ingest the dataset on EU death causes in 2021. Next, with a series of Column Filter and Row Filter nodes, we reduce the dataset to what is pertinent to the analysis. It lists codes for causes of death per country regardless of sex and age. We then use a loop (Group Loop Start and Loop End nodes) to identify what is the top code for cause of death per country, employing the Top k Row Filter node. At the end of this branch, we have the codes that correspond to top death causes all over the EU, but cannot make sense of them yet. To this end, in parallel, we ingest metadata on the death causes with the XML Reader node. Using a series of XPath nodes, we extract column names, descriptions, and other values from the metadata. The descriptions and values come in lists, and to facilitate their posterior matching with death cause codes from the original dataset, we use the Ungroup node to break the lists into single tokens. We filter the resulting data to only keep rows that correspond to causes of death (Row Filter node), and then use the Value Lookup node to match these causes with their codes in the original dataset. Finally, we sort the data with the Sorter node and get to the conclusion that the top cause of death in most EU countries has to do with diseases of the circulatory system.

See our Solution in KNIME Community Hub

Level: Easy

Description: You are a real estate agent working in a new city, and to perform well your first task involves understanding the houses in the region better. A colleague shares a dataset with you and now it’s time for you to explore it. What has been the average housing price, lot size (in acres), and living space (in sqft) in this city, according to her dataset? How are prices distributed and correlated with housing features? What other insights can you gather from this dataset?

Author: Thor Landstrom 

Dataset: Real Estate Data in the KNIME Community Hub

Solution Summary: To tackle this challenge, we compute some general statistics of the dataset such as average price, lot size, and living space. We also calculate the linear correlation for all pairs of numerical features, uncovering which housing attributes have the largest connection with their price. On average, central Seattle is the priciest area in the region, but there are a few other relevant clusters to the south and to the east.

Solution Details: After ingesting the housing data with the CSV Reader node, we compute Pearson's linear correlation for all pairs of numerical attributes with the Linear Correlation node. The results are plotted with the Heatmap (JavaScript) node, revealing which housing attributes relate the most to their price. In parallel, we use the Column Filter node to remove unnecessary columns, and convert the lot size information into acres with the Math Formula node. We use the Statistics View node to get important housing summaries, including their average lot size and price, and group the data with the GroupBy node by zipcode. In the aggregation, we calculate the average housing price per zipcode and their median latitude and longitude values. The Lat/Lon to Geometry node uses the median values per zipcode to generate geometries, which are then visualized with the Spatial Heatmap node.

See our Solution in KNIME Community Hub

Level: Easy

Description: You are a climate scientist studying CO₂ emissions. To make your research insights more accessible to your colleagues, and then write a paper about it, you decide to build a report-enabled component in KNIME that allows users to check how emissions vary for different regions and sources. What are the most alarming insights illustrated in such report?

Authors: Armin Ghassemi Rudd and Marina Kobzeva  

Dataset: CO₂ Emissions Data in the KNIME Community Hub

Solution Summary: To tackle this challenge, we manually select the country that ranks highest in terms of CO₂ emissions and create a PDF report showing its historical emissions, how they vary per capita throughout the years, and what sources they are mostly tied to. Different countries can be selected based on their ranking, leading to different visualizations and reports.

Solution Details: After reading the dataset with the Table Reader node, we use the Row Filter node to select a country based on its CO₂ emissions' ranking. Next, we finish our preprocessing by using the Number Format Manager node, selecting how many decimals we want to use in the CO₂ and CO₂ per capita numbers of our report. We create a component named "Report" that contains a few visualizations for our data: two line plots (Line Plot node) for the historical emissions of CO₂ and CO₂ per capita, and a bar chart (Bar Chart node) showing a breakdown of these emissions for different sources. To turn these visualizations into a PDF report, we feed this component with a report template (A4 Landscape) that is specified with the Report Template Creator node. After the component executes, its visualizations are saved as a PDF report with the Report PDF Writer node.

See our Solution in KNIME Community Hub

Level: Easy

Description: You work in finance and one of your clients wants to understand the value of different company stocks over time. Given a dataset of stock prices, you decide to use simple moving averages (window length = 20) to tackle this task. What companies have an upward trend for the most recent data? And what companies have a downward trend?

Author: Thor Landstrom

Dataset: Stock Data in the KNIME Community Hub

Solution Summary: We propose two different solutions to this challenge. The simplest one involves manually filtering the data for a specific company, calculating its moving average, and then visualizing it with a line plot. The second one relies on a simple data app: a company is selected from a dropdown box and its stock prices are selected, a moving average is computed, and the final points are plotted as a line plot.

Solution Details: Both solutions have a core part in common. After the rows for a company are selected, we use the Column Filter node to isolate dates and close prices, do some typecasting with the String to Date&Time node, sort the data from oldest to most recent with the Sorter node, and then use the Moving Average node to compute simple moving averages (window length = 20). Next, we visualize the results with the Line Plot node. In the simplest solution, we use the configuration of the Row Filter node to select the data for a company. In the more complex solution, we get all company names with the "Get company names" metanode, and then pass them, along with the original data, to the "Visualize company stock prices" component. Inside this component, a Single Selection Widget node allows the selection of one of the company names, which in turn is used to control an instance of the Row Filter node. After that, this solution is basically equivalent to the simplest one.

See our Solution in KNIME Community Hub

Here is how the challenges work:

     We post a challenge on Wednesday.
     You create a solution with KNIME Analytics Platform.
     Upload it to your public KNIME Community Hub Space.
     Check your rank on the Just KNIME It Leaderboard.

Our solution to the challenge comes out on the following Tuesday.

Enjoying our challenges?

They are a great way of preparing for our certifications.

Explore Certification Program

Just KNIME It! Leaderboard

KNIME community members are working hard to solve the latest "Just KNIME It!" challenge - and some of you have solved dozens of them already! Who are the KNIME KNinjas who have completed the most challenges? Click over to the leaderboard on the KNIME Forum to find out! How many challenges have you solved?

Sign up for reminder emails

 

*KNIME uses the information you provide to share relevant content and product updates and to better understand our community. You may unsubscribe from these emails at any time.

Previous Just KNIME It! Challenges

Check out previous seasons of Just KNIME It!