Organizations typically work with multiple suppliers and implement procurement-to-pay (‘P2P’) processes to ensure accurate payments and auditable invoice records. However, as these processes grow more complex, they can become difficult to manage, leading to problems.
One common issue is that the same invoice is paid more than once because it is duplicated . This can cause accounting discrepancies and introduce financial risk to the businesses.
Manually identifying and resolving duplicate invoices is both time-consuming and error-prone, as humans struggle to efficiently find duplicates in large datasets.This inefficiency may result in delayed or missed recovery of overpaid amounts.
In this article we’ll show you a scalable solution using the open source KNIME Analytics Platform to:
- Detect duplicate invoices quickly
- Reduce time-spend and error
- Lower financial risk.
Watch the video below for a summary of the solution, followed by a detailed write-up of how KNIME addresses this issue.
What are duplicate invoices?
A duplicate or double invoice is when the same invoice is paid (or processed) more than once. Invoices can end up being duplicated for several reasons. One common issue occurs when paper invoices are scanned using Optical Character Recognition (OCR) software, leading to errors in invoice numbers. For example, an invoice number with an ‘8’ may be misread and scanned as a ‘5’, resulting in two separate records for the same invoice.
In other cases, suppliers may experience payment delays and reissue the same invoice in hopes of expediting the process. Which would mistakenly be logged as a brand new invoice. If an organization’s quality assurance systems fail to catch these nuances, it may result in double payments, increased expenses, and negative effects on budgets and financial reports.
Additionally, recovering these payments can be challenging, particularly if a supplier goes out of business or if the claim recovery window is missed, leading to potential write-offs.
Given how common duplicate invoices are, a precise and accurate approach must be embedded into business operations to prevent double payments before they occur.
The task: Identify duplicate invoices based on the invoice number
In this example Day5 Analytics demonstrates how KNIME workflows can be used to perform a duplicate invoice audit for the accounts payable department of a large outsourcing client. By using the strong data processing capabilities in KNIME, we can quickly identify and flag duplicate invoices, helping the business avoid overpayments, recover double-payments, and streamline the accounts payable process.
Data description
The dataset provided is a simplified version of real-world financial data, containing only two fields: invoice number (INV_NO) and invoice amount (INV_AMT). In practice, accounts payable (‘AP’) datasets typically include a broader range of details such as invoice numbers, vendor details, invoice dates, amounts, work information, and so on. Although this is simplified, the simulated data mirrors the complexity and variety of financial records encountered in real-world business environments.
The workflow: Detect duplicate invoices
The KNIME workflow for duplicate invoice detection is designed to be both robust and user-friendly. Here is a high-level breakdown of how the workflow operates, and why it outperforms traditional spreadsheet methods:
- Simple Data Preparation: The workflow starts with basic steps that replicate what can be done in Excel. This includes importing data, performing basic cleaning, and identifying duplicates using simple criteria like exact invoice amount matches.
- Advanced Workflow: Next, the workflow is enhanced with a more sophisticated logic that goes beyond Excel capabilities. This includes fuzzy matching to detect near-duplicates invoices and using loops efficiently to handle large datasets.
- Reporting: After the analysis, invoices are ranked based on two factors: how closely they match with another invoice number and the invoice amount. This ranked list is then exported as a PDF report, allowing the company to prioritize actions based on the importance of each case.
The following figure shows a high-level overview of the workflow, which you can download for free from the KNIME Hub.
This workflow builds on traditional spreadsheet software, providing a more reliable and scalable solution for detecting duplicate invoices.
Step by step guide to detecting duplicate invoices
Step 1. Import the invoice data
The workflow starts with importing invoice data. KNIME's flexible data connectors allow you to integrate data from 300+ sources, including Excel files, databases, and ERP systems. In this example, we import two sets of invoice data spanning two months using the Excel Reader node.
Step 2. Preprocess the data
In this step, we consolidate the two data sets with a simple match to identify potential duplicates for further investigation. KNIME's data manipulation nodes can be used to clean and prepare the data for analysis. This includes tasks such as transforming data types, normalizing vendor names, and ensuring consistency in invoice formats.
To do this we use various nodes such as Column Filter, Column Appender, Column Renamer, Row Filter, Column Resorter, Sorter, and Joiner to prepare the data.
Step 3. Identify duplicate invoices
Here we use KNIME’s advanced analytics nodes to implement logic to identify duplicate invoices. In our case, we primarily focus on the similarity of invoice numbers. To increase our detection capabilities, we employ fuzzy matching techniques that allow us to identify near-duplicates. This is particularly useful to address minor discrepancies, which may have been caused by incorrect OCR scanning, or a duplicate invoice submitted by a supplier.
Step 4. Visualize the data
To present clear insights, we use KNIME's integrated reporting capability to visually display duplicate invoice numbers, their ranking, and potential financial impact. By using the Color Manager alongside the Table View node, we apply color coding to the data table making it easier to quickly spot key information and trends.
Alternatively, KNIME allows export of data to interactive browser-based data apps or to external BI tools like Tableau and Power BI to create dashboards.
Step 5. Document the workflow
To simplify the workflow we use metanodes and components, which group sections of the workflow into reusable modules. This makes the workflow easier to understand, maintain, and reuse for future tasks, enhancing overall efficiency.
Step 6. Create a PDF report
Once duplicate invoices are detected, we compile the results into a PDF report using KNIME's Report PDF Writer node. The duplicates are ranked from most-likely to least-likely. This helps teams prevent duplicate payments and recover double-paid amounts quickly to easily prioritize corrections.
By leveraging KNIME's low-code environment, we have demonstrated a streamlined process for duplicate invoice detection. With minimal manual effort and improved accuracy of duplicate invoice detection, this process allows businesses to swiftly identify financial discrepancies and optimize their accounts payable processes.
Next steps: Integrating with operations
This workflow sets the foundation for further automation and prevention of double payments.
- Enhancements: Businesses can further refine the workflow by identifying additional matching criteria, and enhance the workflow to capture more duplicates and improve detection accuracy.
- Automation: The workflow can then be scheduled to run at specific intervals, ensuring that new invoices entering the payment system are continuously monitored for duplicates without manual intervention.
- Integration with Other Systems: The workflow can be seamlessly integrated with various data sources and systems, such as ERP platforms and financial management software. This ensures real-time analysis of the most up-to-date invoice data without manual uploads. By leveraging databases and REST API connectivity, the workflow can become part of a larger enterprise system for real-time monitoring and financial management.
The results: Comprehensive insights and actionable data
The duplicate invoice audit conducted using KNIME provided significant insights into the organization’s accounts payable processes. By scoring and ranking invoice duplicates, the workflow streamlined the auditing process, revealing potential overpayments and financial leakage.
About Day5 Analytics
Day5 Analytics has extensive experience in helping clients address Finance challenges . In one case, we assisted the analysis of over 300,000 invoices and identified duplicates in 0.3% of the cases. This resulted in the prevention of recovery of over $100,000 in double payments. Our approach uses KNIME workflows to collect, transform, analyze and act on duplicate invoice detection efficiently. For more details on our client's success using this method, please visit the link here.