DATA MINING-BASED ASSESSEMENT OF THE RISK OF USING FINANCIAL INTERMEDIARIES FOR MONEY LAUNDERING

O.V. Kuzmenko Doctor of Economics, Associate Professor, Head of the Department of Economic Cybernetics, Sumy State University ORCID: 0000-0001-8575-5725 A.O. Boiko PhD, Associate Professor, Associate Professor of the Department of Economic Cybernetics, Sumy State University ORCID: 0000-0002-1784-9364 H.M. Yarovenko PhD, Associate Professor, Associate Professor of the Department of Economic Cybernetics, Sumy State University ORCID: 0000-0002-8760-6835 T. V. Dotsenko PhD Student of the Department of Economic Cybernetics, Sumy State University ORCID: 0000-0001-5713-2205

the Analysis package, the Advanced Methods tab, the GLM General Linear Models tab for data mining. A data set was generated for 215 countries of the world for 2017 to conduct the study. It was implemented the ranking of the predictors by the degree of their influence on the response: 1) Corruption Perceptions Index; 2) internally displaced persons, new displacement associated with conflict and violence (number of cases) 3) Happy Planet Index; 4) claims on the central government; 5) bank secrecy; 6) Global Terrorism Index; 7) gross domestic product per capita. The constructed models of neural networks are represented by architecture (the number of layers and hidden neurons), performance and error (training, control, test), learning algorithm, as well as error functions, active hidden and active output neurons. The reliability of the presented models is based on the following criteria: the criteria given in the columns "Training Performance", "Control Performance", "Test Performance". The risk of using financial intermediaries for money laundering for the period 2019 -2023 has been predicted, showing its gradual growth since 2020. It is proved that the predicted risk values of using financial intermediaries for money laundering, regardless of the rather low predicted level for 2019, tend to increase rapidly in the near term.
Keywords: money laundering risk; neural network; multilayer perceptron; network based on radial basis functions; prediction.
Statement of the problem in general terms and its relationship with important scientific and practical tasks. In the modern economic world, which is characterized by the rapid development of the global financial system and the rapid growth of information technologies for the implementation of financial transactions, the problems of global shadowing of economic processes, as well as money laundering, are becoming more acute. The solution to this problem requires the introduction of an effective system for assessing the risk of using financial intermediaries for money laundering. Data mining is of particular importance among modern developments against money laundering.
Such innovations include neural networks that are adapted to the study of the complex dependencies inherent in modern financial transactions in the context of information constraints. Consequently, the solution to the problem of assessing the risk of using financial intermediaries for money laundering with new, non-standard methods of analysis and modeling of economic processes is becoming more relevant.
Analysis of recent studies and publications, which initiated the search for the solution to this problem and on which the author relies, highlighting previously unresolved parts of the general problem, which are discussed in this paper. Scientists V. Bilous [17], S. Hurzhii, A. Kopylenko, Ya. Yanushevych [18], R. Marchuk, A. Popov, V. Onisiev [19], V. Zakharov [22] and other scientists study the general theoretical, organizational and legal issues of combating money laundering and terrorist financing. Some aspects of the assessment and management of the risk of money laundering and terrorist financing are highlighted in the works of the following scientists: V. Kadnichanska, T. Romas [24], N. Moskalenko, N. Klimchuk [27] and other. Global issues of the analysis of international experience in using a risk-based approach against money laundering in the foreign economic activity are revealed in their studies by N. Vnukova, A. Kolodiziev, I. Chmutova [20], O. Smahlo [30]. The solution to more specialized problems of assessing and managing risks of money laundering with the help of banks is described by A. Berezhnyi [28], M. Khudokormova [31], I. Chmutova [32].
Such scientists as H. Setlak [29], M. Mozolevska, A. Stavytskyi [26], D. Ivanov [23], A. Matviichuk [25] work in the area of specific issues on the use of neural networks as a method of prediction in the financial sphere.
Research objective. The purpose of the paper is the mathematical economic modeling of the neural network describing the dependence of the risk of the use of financial intermediaries for money laundering and predicting the possible values of this risk in the short term. Achievement of this goal requires solving a number of tasks: identification of key risk factors; description of architecture, performance, error (training, control, test), learning algorithm, error functions, active hidden and active output neurons of a multilayer perceptron and a network based on radial basis functions; risk prediction; estimation of statistics of predicted values and sensitivity analysis of neural network models.
Statement of basic research materials with full justification for the scientific results. The study of the risk of using financial intermediaries for money laundering involved the selection of the most relevant indicators and formation of a certain sequence of its calculation. Thus, we will consider the steps of the proposed scientific and methodological approach in more detail.
Step 1. Formation of the statistical base of the study. A data set was generated for 215 countries of the world for 2017 to conduct the study. These figures represent statistical information that has been obtained from official websites of the world organizations. Thus, the authors selected 1 regressand -the level of risk of using financial intermediaries for money laundering from the results of previous studies  [7].
The rationale for the inclusion of the specified set of indicators is the results of collinearity studies by applying sigma-limited parameterization ( Figure 1) and correlation analysis of the dependence of both the regressand on each of the regressors, as well as the factors among themselves ( Figure 2). It is proposed to use the Statistica software, the Analysis package, the Advanced Methods tab, the GLM General Linear Models tab for such data mining as identifying key factors. An analysis of Figure 1 (beta coefficients -the Risk of money laundering graph) indicates the feasibility of ranking the predictors by the degree of their influence on the response as follows: 1) Corruption Perceptions Index; 2) internally displaced persons, new displacement associated with conflict and violence (number of cases) 3) Happy Planet Index; 4) claims on the central government; 5) bank secrecy; 6) Global Terrorism Index; 7) gross domestic product per capita; but only the first two have a strong influence, while the others have a moderate one.
In addition, the partial correlation coefficients (Risk of money laundering graph in Figure 1) show the degree of influence of one predictor on the response, assuming that other predictors are fixed at a constant level. The calculated values of this indicator confirm the above conclusion that there is a significant influence of only the Corruption Perceptions Index and the indicator of internally displaced persons, new displacement associated with conflict and violence (number of cases) on the risk of using financial intermediaries for money laundering, as well as moderate influence of other indicators.
In terms of the analysis of the determination coefficient (column R in Figure 1), i.e. the square of the coefficient of multiple correlations between this variable and others variables, we note the moderation of all indicators, but the relationship between the three predictors (bank secrecy, Corruption Perceptions Index, Happy Planet Index) and all others is much greater than for four unspecified predictors.
Correlations of vectors in the plan matrix X Corelate. matrix for vectors in the plan X   Figure 1) suggest that there is moderate reverse causality between the level of risk studied and the Corruption Perceptions Index and the Happy Planet Index, as evidenced by the corresponding correlation coefficients of -0.6466 and -0.5454. In addition, there is a weak reverse causality between the effective attribute and factor attribute banking secrecy. In the context of other regressors, namely: gross domestic product per capita (GDP), claims on the central government, internally displaced persons, new displacement associated with conflict and violence (number of cases), Global Terrorism Index, Happy Planet Index, the relationship is not confirmed at 95% significance.

Effect
In terms of the analysis of the multicollinearity of regressors, we observe only one case of a high degree of dependence between the Corruption Perceptions Index and the Happy Planet Index since the corresponding correlation coefficient is 0.71. Despite the need to remove one of these factors from the model to mitigate the collinearity problem of the corresponding vectors, we propose leaving both indicators, since, from an economic point of view, both indicators are of considerable interest in terms of the study of the risk of using financial intermediaries for money laundering.
Step 2. Formation of research methodology. Justification of the methods of mathematical formalization of the problem. Risk assessment of financial intermediaries for money laundering using the principles of data mining is proposed to be carried out by building a neural network. It is proposed to present mathematical economic models of the neural network describing the risk of using financial intermediaries for money laundering on factors in the form of a multilayer perceptron and a network based on radial basis functions.
Thus, the mathematical economic model of the neural network of the risk under study takes the following form [2]: (1) where -layer 1; -layer 2; -layer N; і -input number; j -the number of the neuron in the layer; -the і-th input signal of the j-th neuron in layer 1; -the weighting factor of the i-th input signal of the j-th neuron in layer N; -threshold level of j-th neuron in layer N. In turn, the mathematical economic model of the neural network of the risk of using financial intermediaries for money laundering in the form of a network based on radial basis functions takes the following form [8,9]: (2) where --the weighting coefficient of the i-th input signal; -centers of radial basis functions. The Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm is used to build a neural network such as the MLP multilayer perceptron. It is one of the most comprehensive quasi-Newton methods that consists in implementing an iterative procedure of numerical optimization to search for a local extremum of a nonlinear function without restrictions. The BFGS algorithm involves the implementation of the following sequence of steps [15]: 1) determining the weighting coefficients by random small quantities and the initial approximation value of the inverse Hessian V -a matrix of size nxn, where n is the length of the gradient vector g. 2) calculation of the gradient g.

3) calculating the correlation of weighting coefficients
, where is the parameter of the learning rate.

4) defining a new gradient value
given the previous value , and calculating the gradient change . 5) calculation of inverse hessian (r gradient change, s weight change): (3) 6) calculation of change of weighting coefficients and corresponding adjustment of parameters . 7) determining the value of the error. If the error exceeds the value of the specified accuracy, it is necessary to repeat the algorithm, starting from step 4. Otherwise, the algorithm stops. The RBFT algorithm is used to construct a neural network based on the radial basis functions (RBF). It is suggested to use the Statistica software, Analysis package, Neural Network tab, Regression tab to implement this stage. It is feasible to determine weighting coefficients using the least-squares method.
Step 3. Practical testing of design calculations. We will carry out mathematical economic modeling of two types of neural networks (MLP multilayer perceptron and networks based on radial basis functions (RBF)) describing the regression dependence of the risk of using financial intermediaries for money laundering on relevant regressors and will systematize the results in a tabular form (Figure 3).

Figure 3 -The results of the modeling of neural networks of regression dependence of the risk of using financial intermediaries for money laundering on regressors
The analysis of Figure 3 shows a much larger spectrum of constructed neural networks in the form of MLP multilayer perceptron (80% of models) than networks based on radial basis functions RBF (20% of models). All presented models are characterized by a high level of adequacy as evidenced by the criteria given in the columns "Training Performance", "Control Performance", "Test Performance". At the same time, the performance of MLP models has a much larger range of variation of correlation coefficients -from 0.7890 to 0.8685 (training sample), from 0.7286 to 0.8505 (control sample), from 0.8099 to 0.8448 (test sample) than RBF models -respectively, from 0.8274 to 0.8559 (training sample), from 0.6919 to 0.7169 (control sample), from 0.8047 to 0.8089 (test sample). The reliability of 10 constructed models of neural networks is confirmed by the error indicator within the framework of the training, control and test samples, which takes values close to zero.
To further use the constructed models for predicting the level of risk of using financial intermediaries for money laundering, we will choose two MLP perceptron models and RBF-based networks with better adequacy characteristics, namely: the first model with MLP 7-4-architecture 1 (a total of 7 layers, a number of hidden layers -4), the third model with MLP 7-6-1 architecture (a total of 7 layers, a number of hidden layers -6, Figure 4), the eighth model with RBF 7-20-1 architecture (a total of 7 layers, a number of hidden layers -20), a ninth model RBF 7-20-1 architecture (a total of 7 layers, a the number of hidden layers -20). The BFGS algorithm is used to build a neural network such as the multilayer perceptron MLP 7-4-1 and MLP 7-6-1, the RBFT algorithm is used, respectively, to build the neural network based on the radial basis functions RBF 7-20-1. A scatter diagram of theoretical (obtained by using four selected constructed neural networks) and the actual value of the risk of using financial intermediaries for money laundering is shown in Figure 5. Based on the visual correlation of neural networks built to predict the risk under study, it is necessary to note the high reliability of the selected model, as evidenced by a dense arrangement of actual values compared to theoretical (predictive ones found using the models).

Figure 5 -Ratio of actual and projected levels of risk of using financial intermediaries for money laundering
A deep analysis of input predictors is important for the formalization of the risk of using financial intermediaries for money laundering using a neural network. Thus, we construct the corresponding scatterplots ( Figure 6 -Figure 9).  An analysis of the pairwise dependence of the effective attribute on gross domestic product per capita and bank secrecy indicates the following (Figure 6): the absence of a clear dependence of the risk of using financial intermediaries for money laundering on gross domestic product per capita, because despite the absence of a significant variation in the factor attribute, we observe a change in the effective one from 0.4 to 1.0; the value of the bank secrecy indicator is clearly grouped into 3 clusters, with the third cluster being the largest by volume, i.e. the increase of the value of this regressor will lead to the increase in the investigated level of risk. In terms of the study of the dependence of the risk of using financial intermediaries for money laundering on claims on the central government (Figure 7), we observe a chaotic distribution, i.e. the absence of a clear relationship between the predictors under investigation. In terms of the indicator of internally displaced persons, new displacement associated with conflict and violence (the number of cases), similar to the case of GDP per capita, there is a lack of a clear dependence of the studied risk on this factor, because despite the absence of a significant variation in the factor characteristics, we observe change in the effective attribute from 0.4 to 1.0. In terms of the study of the influence of the Corruption Perceptions Index (Figure 8) and the Happy Planet Index (Figure 9) on the risk of using financial intermediaries for money laundering, we observe an average inversely proportional relationship, i.e. an increase in the factor attribute lowers the value of the effective one and vice versa. In the context of the study of the impact of the Global Terrorism Index, we observe a chaotic distribution.
In terms of the last but one of the most important stages of the presented methodology -predicting the level of the risk under study, there is a need for a preliminary detailed analysis of quality of the four neural networks constructed and described above: the multilayer perceptron MLP 7-4-1, MLP 7-6-1, the network based on radial basis functions RBF 7-20-1, RBF 7-20-1. For this purpose, we consider the statistics of the predicted values ( Figure 10) and the sensitivity of the models of selected neural networks in terms of input predictors ( Figure 11).

Figure 11 -Sensitivity of the models of selected neural networks in terms of input predictors
An analysis of the statistical characteristics of neural network models shown in Figures 10 and 11 indicates a high quality of the models (insignificant variation of the minimum and maximum levels both in the training, control and test samples) and an insignificant level of sensitivity of the models to the input data scale.
In terms of predicting the risk of using financial intermediaries for money laundering for the period 2019-2023, we will form (based on an expert approach) promising areas for the development of 7 regressors: gross domestic product per capita (GDP), claims on the central government, internally displaced persons, new displacement associated with conflict and violence (number of cases); bank secrecy; Corruption Perceptions Index; Global Terrorism Index; Happy Planet Index presented in Table 1. Analysis of the predicted values of the risk of using financial intermediaries for money laundering (Figure 12, columns 2 -5) for the period 2019 -2023 indicates fairly similar values (derived from the use of four neural networks): multilayer perceptron MLP 7-4-1, MLP 7-6-1, radial basis function networks RBF 7-20-1, RBF 7-20-1. Therefore, it should be pointed out that the predicted risk values of using financial intermediaries for money laundering, regardless of the rather low predicted level for 2019, tend to increase rapidly in the near term.  Conclusions of this study and prospects for further research in this area. Thus, it should be noted that risk assessment of the use of financial intermediaries for money laundering based on neural networks is a very relevant, powerful and flexible tool for ensuring an effective government control system, given the need to process large datasets. This method allows automatically identifying the complex dependencies of economic processes, predicting possible results and using them when making effective decisions in the field of public administration. The introduction of this approach will allow effectively predicting and combating crimes related to money laundering and terrorism financing, will contribute to the positive economic, financial, social, political, cultural development of the country, as well as increase the country's rating in the world.