Disponible en Español

# Course on Machine Learning and Central Banking

Digital Course. November 16 – 20, 2020.

During the first edition of this Course, held virtually in collaboration with Deutsche Bundesbank and the Banco Central de Costa Rica, participants were able to learn the fundamental concepts regarding to Machine Learning and how Central Bank issues can be solved through this set of techniques. In addition to conceptual sessions, the Course had hands-on exercises where it was shown how to implement the concepts in the programming language R.

**Day 1 **

The first day started with a presentation by Professor Stefan Bender with an overview on Machine Learning (ML) and how it can be a reliable tool for central banks to use within their different areas to solve multiple problems related to aspects of payment systems and financial stability, among others. It was mentioned the general process to develop a system based in ML which is: i) to understand the business problem, ii) map the original problem to a ML problem, iii) understand the data being used, iv) explore and prepare (preprocess) the data, v) select the best method, v) evaluation and vi) deployment. Then it was reviewed the main types of ML which are supervised, semi-supervised and unsupervised. Some examples were mentioned on of each of them like clustering algorithms and classification methods. Finally he mentioned the main factors to consider when implementing these procedures, which are complexity, overfitting, robustness, interpretability and training and testing time.

The following session was devoted to showing a use case where the Deutsche Bundesbank developed a methodology based on ML to find links between records related to the same entity on databases from different sources, given that unique identifiers were not available to be directly linked. The development of such system can provide huge benefits related to information analysis. The process comprised these steps: preprocessing of the databases, reduction of search space, comparison of records, classification (decide if there or there isn´t a link) and evaluation of the results. The development also presented a challenge regarding privacy issues.

Next, it was reviewed the splitting data process common in ML. It is a golden rule in ML to evaluate models on data that was not used to train them. To this end, it is necessary to divide (split) the whole data into different sets, this partition is almost always done in a random manner and each set have different purposes. The training set is used to train the model where it learns the underlying data patterns, then the validation set is used to try different model configurations and to select the one with the best performance, finally the test data is used to see how well a model performs on unseen data.

In this session it was also reviewed the concept of cross-validation, which is an exhaustive procedure that tries different training and validation sets. Then it was shown the concept of confusion matrix which involves the accounting of true positives, true negatives, false positives and false negatives which along with other measures as accuracy, recall, precision and f1-score serve for the validation and evaluation of ML models.

The day ended with a series of hands-on exercises about the main R commands.

**Day 2 **

The second day was devoted to deepen on shrinkage methods which are variations of the ordinal least squares (OLS) regression.

The presentation started with the main motivations that drives the use of these techniques, which are the handling of the curse of dimensionality, the reduction of over-parametrization and overfitting, and the reduction of computational resources. It was also mentioned that the core idea can be also found in econometric techniques as partial least squares and principal components regression.

The session continued with a review on ridge regression. It was noted that the difference with the OLS regression relies on the addition of a new term to the objective function that impose a penalty on coefficients’ magnitude, this term makes use of the L2 norm. It was mentioned that this subtle difference has as consequence a reduction in the value of regression coefficients’ and an improvement in the prediction of unseen values.

Then the lasso regression was reviewed. As in the case of the ridge regression, in lasso a penalization term is added to the objective function, but in this case the term makes use of the Manhattan distance. The new term forces to the least important regression coefficients’ to be zero, which is equivalent to exclude the variables from the model. This technique presents an improvement in the prediction, and has a computational advantage since it forces some coefficients to be zero.

Afterwards, the elastic net technique was explained. It was observed that sometimes the ridge regression suffers from a grouping effect, this is, strongly correlated variables tend to be in or out the model together. In other to avoid this problem, elastic net makes a convex combination of ridge and lasso penalization terms. Finally, an example of classification of multiple classes was presented, where the performance of the shrinkage methods was compared.

The day concluded with a series of hands-on exercises in R.

**Day 3**

The Course’s third day was devoted to study the decision trees methods, how these are implemented and some examples. Also, the first part of the ensemble methods topic was introduced.

The day started with an introduction to decision trees, which are a set of methods that belong to the category of greedy algorithms that make recursive partitions on the data in order to generate subsets that predominantly belong to one value of the dependent variable. The focus of the session was on the classification trees i.e. the dependent variable is a categorical variable.

The session continued by presenting an example on how a decision tree might look, followed by the fundamental steps to consider in order to create it, these are the split criterion, how to select the independent variables for the splits, the depth the tree should have and which value should be predicted in each part of the tree. Then it was discussed in depth each of the fundamental steps, first it was mentioned the main split criterions: CART, minimum value and maximum value; next how the method finds the independent variable and the corresponding cut-off through the split; then stop and pre-pruning criterions to control the deepness of a tree and avoid overfitting were presented; and finally three different alternatives to generate predictions at each leaf node were discussed, which were majority voting and predicted probabilities.

Then, it was presented that one of the main goals in Machine Learning is to find models that reach the optimal point of minimization of both bias and variance. An option to achieve this goal is to use cross-validation to estimate the validation error and test the model with new observations. Another possibility is to combine many models to generate averaged predictions, such approach is called ensemble learning. In the last part of the day an introduction on ensemble methods and bagging was given. Ensemble methods combine multiple weak models in order to generate a stronger one. In bagging, the idea is to generate multiple datasets from the original to estimate a new model on each one, and then to aggregate the predictions of every model.

The day finalized with hands-on exercises of tree techniques implementations in R.

**Day 4**

The fourth day was devoted to the deepening on two ensemble methods that uses as base model decision trees, random forest and gradient boosting.

After a recapitulation on ensemble methods and bagging, the session focused on the analysis of random forests, for which the main idea is first to generate a number of new datasets (through re-sampling), then train a tree for each data set by only considering a proportion of the whole set of independent variables, this to introduce randomness in the methodology, and finally to aggregate the predictions of each tree to generate the final prediction. Then, it was mentioned that although random forests reduces the variance due to its ensemble nature, it is needed to put attention on the search of the optimal set of parameters for the model; this is to avoid high variance.

The session continued with an implementation of random forests in R

Then a revision on gradient boosting was done. The main idea behind this technique is to sequentially train new models, giving more importance to the observations that are difficult to predict, the level of difficulty is reflected on weights or residuals that are linked to each observation. For this case, unlike the random forest, the dataset used for each model is the same, and only the weights changes on each iteration. That gradient boosting is often applied in the context of trees, but any weak model can be used. It was also observed, on the one hand, that this technique can solve any type of problem, as long as the gradient of the loss function involved can be calculated. On the other hand, one should consider that the implementation must be accompanied by a fine hyper-parameters tuning in order to control the bias, and that for this case is required more computation time compared with the rest of the techniques seen.

Day ended with a hands-on exercise on gradient boosting in R.

**Day 5 **

The last day of the Course was devoted to present three main Machine Learning projects developed by CEMLA along with regional central banks and University College London (UCL), this followed by an overview of the topics given during the Course and the participants’ final comments.

##### Monday, November 9

**Opening speech**

Dr. Serafín Martínez Jaramillo, Advisor to General Director, CEMLA

Introduction

**Gabriela Alves Werb and Dr. Jens Mehrhoff, Deutsche Bundesbank**

- Structure and Organization of the Course

**Prof. Stefan Bender - Deutsche Bundesbank**

- Machine Learning and Central Banking

**Gabriela Alves Werb and Dr. Jens Mehrhoff, Deutsche Bundesbank**

- Train, Test and Validation Samples

Introduction II

**Gabriela Alves Werb and Dr. Jens Mehrhoff, Deutsche Bundesbank**

- Cross-Validation
- Confusion Matrix
- Evaluation Measures (Precision, Recall, F1-Score, etc.)
- PR Curve

##### Tuesday, November 10

Shrinkage

**Gabriela Alves Werb and Dr. Jens Mehrhoff, Deutsche Bundesbank**

- Introduction to Shrinkage
- Lasso
- Ridge

Shrinkage II

**Gabriela Alves Werb and Dr. Jens Mehrhoff, Deutsche Bundesbank**

- Elastic Net
- Extension: Multiclass Problems

##### Wednesday, November 11

Decision Trees

**Gabriela Alves Werb and Dr. Jens Mehrhoff, Deutsche Bundesbank**

- Decision Trees (CART)
- Bootstrapping

Decision Trees II

**Gabriela Alves Werb and Dr. Jens Mehrhoff, Deutsche Bundesbank**

- Introduction to Ensemble Methods

##### Thursday, November 12

Ensemble Methods (Random Forest)

**Gabriela Alves Werb and Dr. Jens Mehrhoff, Deutsche Bundesbank**

- Random Forest

Ensemble Methods (Gradient Boosting)

**Gabriela Alves Werb and Dr. Jens Mehrhoff, Deutsche Bundesbank**

- Gradient Boosting

##### Friday, November 13

Machine Learning in Practice

Serafín Martínez-Jaramillo, CEMLA

**Wrap Up and Q&A**

Serafín Martínez-Jaramillo, CEMLA; Gabriela Alves Werb, Dr. Jens Mehrhoff, and Prof. Stefan Bender - Deutsche Bundesbank

**Dr. Serafín Martínez-Jaramillo
**Advisor, CEMLA

Serafin Martinez-Jaramillo is a senior financial researcher at the Financial Stability General Directorate at Banco de México and currently he is an adviser at the CEMLA. His research interests include: financial stability, systemic risk, financial networks, bankruptcy prediction, genetic programming, multiplex networks and machine learning. Serafin has published book chapters, encyclopedia entries and papers in several journals like IEEE Transactions on Evolutionary Computation, Journal of Financial Stability, Neurocomputing, Journal of Economic Dynamics and Control, Computational Management Science, Journal of Network Theory in Finance and some more. Additionally, he has co-edited two books and two special issues at the Journal of Financial Stability. Serafin holds a PhD in Computational Finance from the University of Essex, UK and he is member of the editorial board of the Journal of Financial Stability, the Journal of Network Theory in Finance and the Latin American Journal of Central Banking.

**Prof. Stefan Bender
**Head of the Research Data and Service Center of the Deutsche Bundesbank

Stefan Bender is Head of the Research Data and Service Center of the Deutsche Bundesbank and – since 2018 - Honorary Professor at the School of Social Sciences, University Mannheim. With his position at Deutsche Bundesbank he was chair of INEXDA (the Granular Data Network) and vice-chair of the German Data Forum (www.ratswd.de). Before joining the Deutsche Bundesbank Bender was head of the Research Data Center (RDC) of the Federal Employment Agency at the Institute for Employment Research (IAB), which he has international established a research data centre including access to IAB data in the US (for example Berkeley, Harvard). His research interests are data access, data quality, merging administrative, survey data and/or big data, record linkage, management quality and mobility of inventors. He has published over 100 articles in journals including the American Economic Review or the Quarterly Journal of Economics.

**Dr Jens Mehrhoff**

Head of the Sustainable Finance Data Hub in the Directorate General Statistics of the Deutsche Bundesbank.

Prior to his current role, Jens was Head of the Section for Business Cycle, Price and Property Market Statistics for many years, and recently on secondment to the statistical office of the European Union (Eurostat). Jens is a member of the UN Global Working Group on Big Data and commits research on classification from a central bank's perspective. He gave talks at several high-profile international conferences as well as to major international organizations. He has a post as lecturer in machine learning at Goethe University Frankfurt.

**Gabriela Alves Werb
**Deutsche Bundesbank

Gabriela Alves Werb is a Ph.D. candidate at the Goethe University Frankfurt within the structured doctoral program of the Graduate School of Economics, Finance, and Management (GSEFM). She holds a M.Sc. degree in Quantitative Management from GSEFM (Germany) and a degree in Production Engineering from PUC-Rio (Brazil).She started her career at IBM (2007-2011), where she dedicated herself to financial sales, financial analysis, controlling, pricing and process improvement. In 2011, she joined the consulting firm Hays, where she worked for more than three years. She initially led projects in the Engineering & Manufacturing sector and later led and restructured the Oil & Gas business division in Brazil.

During her Ph.D. studies, she worked at the Chair of Electronic Commerce at the Goethe University Frankfurt. Her teaching focused on machine learning methods and their application to solve substantive problems in several disciplines, including marketing. In August 2020, she joined the Research Data and Service Centre (RDSC) at the Deutsche Bundesbank as a Data Scientist.