This is part of a series of articles to show you solutions to common finance tasks related to financial planning, accounting, tax calculations, and auditing problems all implemented with the low-code KNIME Analytics Platform.
Contract frauds involve deceitful practices in the creation, execution, or enforcement of agreements, aiming to gain unfair advantages or financial benefits. Possible fraud types in contracts include misrepresentation, where false or misleading information is provided; forgery, involving falsified signatures, investment sums or documents; and Ponzi schemes, where returns are paid from new investors' money rather than profits.
In this article, we explain how GenAI can help finance departments automate creating custom multilingual alerts, after fraudulent contracts are identified using visualizations and statistical measures.
How can GenAI help finance departments?
Large Language Models (LLMs), with their ability to understand human languages and generate coherent responses, can help boost productivity and enable new levels of automation and personalization.
In the context of contract frauds for auditing purposes, GenAI can help automate the creation of custom fraud alerts in different languages to effectively address a multilingual audience.
Traditionally, before AI, a financial institution with a multilingual customer base would rely on human-translated templates. This process works but is slow, time-consuming, hard to customize for individual customers and does not scale well every time a new language is requested.
With the help of GenAI, these limitations are significantly reduced. Once suspicious contracts are identified, a financial institution can parameterize a prompt with customer details (e.g., name, surname, language, etc.) and query an LLM to generate alerts in seconds. Not only is this process fast and scales well for many different languages, but it can be further customized to include contract details and definitions using Retrieval Augmented Generation.
Finance departments can easily implement GenAI-driven solutions in KNIME Analytics Platform using the nodes of the KNIME AI Extension.
However, before we can leverage GenAI effectively to signal suspicious activities, contract frauds need to be identified. One of the major challenges here lies in the availability of datasets containing fraud examples. When the dataset contains enough fraud examples, a machine learning-based approach, for example, enables us to conduct a multidimensional analysis to identify fraudulent contracts, leading to better accuracy.
However, in reality, datasets containing many fraud examples are rare. This forces the adoption of different strategies, such as identifying fraudulent contracts using outlier detection-based strategies with visual or statistical techniques (e.g., quantiles or IQR).
Visual and statistical techniques for fraud detection
Data visualization simplifies identifying fraudulent contracts by transforming complex datasets into intuitive formats, making it easier to spot anomalies from what constitutes "normal" data.
Visualizations are also useful when fraud examples are scarce and we need to present our findings in a way that is aesthetically pleasing and easy to interpret.
Common visualizations include pie/bar charts for comparing categories, scatter plots/bubble charts for relationships among numeric columns, histograms/violin plots for data distribution, and box plots for central tendency and variability.
To facilitate the task, the creation of a comparative dashboard using KNIME components can provide an interactive and customizable overview of complex data relationships.
However, using visualizations requires the finance department to manually check every plot, hindering automation efforts.
Statistical techniques, such as the interquartile range (IQR) can come to the rescue. This is a simple, nonparametric outlier detection method in a one dimensional feature space. To calculate the IQR, the data set is divided into quartiles, i.e., Q1 (lower quartile), Q2 (median), and Q3 (upper quartile), where IQR = Q3 – Q1. In addition to that, we define k, the interquartile range multiplier. This parameter is usually set to 1.5, and defines how sensitive the outlier detection will be.
An outlier is then a data point x that lies outside the interquartile range.
x > Q3 + k * IQR or x < Q1 - k * IQR,
This outlier treatment can easily be implemented in KNIME Analytics Platform using the Numeric Outliers node.
Note. We experimented using GenAI also for the detection of fraudulent contracts. However, the experiment created concerns over the correctness of results, the interpretability of the detection process, and the cost in human labor for prompting, and data pre- and post-processing.
The Task: Identify fraudulent investment sums and generate alerts
Among contract types, investment agreements are particularly prone to fraud. These contracts often involve complex terms, significant financial stakes, and promises of high returns, making them attractive targets for fraudulent activities. Typically, fraudsters try to forge signatures and personal details, alter contract terms, or fake transaction sums, exploiting investors' lack of expertise and trust.
In today’s task, we’ll act as a financial institution and concentrate on the identification of fraudulent investment agreements, focusing on the investment sum stated in the documents. The challenge lies in accurately flagging fraudulent amounts using data visualization and the IQR method, and leveraging GenAI to create customized, multilingual alerts. Lastly, the solution should be deployed as an on-demand web-based application.
The dataset contains 100 investment agreements in PDF format. Each agreement has a unique ID and reports the terms and object of the investment, including the product name, the investor’s name, email address, date and investment sum.
A second dataset, stored in an SQLite database, contains investors’ information, such as name, email address, and language, as well as details about the type of investment agreement.
The process involves five steps:
- Access, parse and join data sources
- Identify outliers using visualizations and the IQR
- Create custom alerts in different languages with GenAI
- Deploy the solution as a web-based application
The Workflows: Use visualizations and IQR to identify frauds and create alerts with GenAI
All workflows used in this blog post are available publicly and free to download on the KNIME Community Hub. You can find the workflows on the KNIME for Finance space under Fraud Detection in the “Visualizations and IQR” section.
The first workflow covers the detection of frauds and the generation of alerts. You can view and download the workflow Fraud detection with dataviz, IQR and GenAI for alerts from the KNIME Community Hub.
Step 1: Access, parse and join data sources
We start off by importing investment agreements using the PDF Parser node. Next, we parse the documents to extract key information, such as agreement IDs, the investors’ name, email, date, and invested sum.
In the lower workflow branch, we connect to an SQLite database, select the customer_table and import the data containing investors’ personal details, including language, email and agreement type.
Using a Value Lookup node on the email column, we join the two data sources.
Step 2: Identify outliers using visualizations and the IQR
To detect fraudulent contracts with data visualizations, we rely on the creation of a comparative dashboard using KNIME components.
Components in KNIME are custom nodes that bundle specific functionalities, can have their own configuration dialog and composite views. The latter feature facilitates the creation of customizable, interactive dashboards with charts, tables, and widgets. Users can also design visual layouts, edit HTML content, and generate reports.
We assign color cues to investment sums, and visualize them using a bar chart, a scatter plot, a box plot and a histogram. When wrapped in a component, plots can propagate the selection of one data point across all plots. In this way, we can clearly see the data points that strongly deviate from the rest (marked in yellow), and inspect the details.
The second strategy to detect contract frauds uses the IQR method. The KNIME implementation is straightforward and requires only one node: the Numeric Outliers.
In the node configurations, we select the payment column, define k, and remove all data points that are not outliers. This choice of treatment is useful to isolate those investors that need to be alerted. Lastly, we save the model for deployment.
Step 3: Create custom alerts in different languages with GenAI
Once fraudulent agreements are identified, we can leverage GenAI to streamline the creation of multilingual alerts without relying on time-consuming templates. Additionally, we want our alerts to include agreement details, as well as explanations of the agreement type as defined by the knowledge base of the financial institution.
To do that, we’ll rely on the nodes of KNIME AI Extension. The overall approach can be summarized in four steps: Authenticate – Connect – Customize – Prompt.
After authenticating to the AI provider, for example OpenAI, we connect to the LLM of choice. In our case, gpt-3.5-turbo.
In the lower workflow branch, we use the PDF Parser to access the knowledge base of the financial institution, where agreement types are described. With this document, we define a RAG process, aimed at customizing the model responses and enriching the content of the alerts with context. To do that, we split the document into sentences, embed them and store the embeddings in a FAISS Vector Store, which we save for deployment.
Next, we use the Vector Store Retriever node to perform a similarity search between the agreement types in the investor database and the institution’s knowledge base containing the descriptions of each type of agreement. The goal is to retrieve the most similar descriptions from the vector store.
Once similar documents are retrieved, we engineer a prompt using the String Manipulation node. In the interest of automation, we parameterize the prompt using values of the detected outliers (i.e., language, investor name, agreement type, date and sum) and augment it with details about the type of agreement signed by the investors.
Lastly, we feed the prompt to the LLM Prompter node to generate personalized multilingual alerts. We can inspect and download the generated text in the composite view of the “Email Preview” component.
Step 4: Deploy the solution as a web-based application
The second workflow Fraud detection with IQR and GenAI for alerts_Deploymentdeploys the application as a Data App that can be consumed on-demand on a web browser.
The deployment workflow follows a very similar design.
The key difference in this workflow lies in removing the identification of outliers with visualizations, as this method requires manual inspection and cannot be automated in production. Additionally, we rely on the Model Reader nodes to re-use the outlier detection model and the vector store that were created in the previous workflow.
Lastly, we exploit another feature of KNIME components. The workflow and the “Email Preview” component can be deployed as an on-demand data app on KNIME Business Hub and define interactive pages in web applications for ease of consumption and interaction.
The results: Detected fraud and multilingual alert preview as a data app
The techniques illustrated above consistently identify two outliers, whose investment sums are considerably higher than the rest.
For these investors, custom email alerts in different languages are generated and displayed as a Data App. Here, we display only the English version. The alert warns the recipient of suspicious activities and requests her to immediately contact the financial institution.
With the help of GenAI, the generated response is enriched to include also an explanation of the agreement type (circled in red). The content of the alert can be downloaded and further edited before notifying the investors.
KNIME for Finance: Scale fraud detection with stats and GenAI
In helping detect contract frauds with different techniques, KNIME Analytics Platform provides a flexible and automated solution using a low-code, visual, and intuitive user interface.
The introduction of GenAI to enrich the analytical process helps scale the creation of custom alerts in multiple languages, ensuring timely, reliable and cost-effective notifications.
Explore more finance solutions in KNIME for Finance.