Диссертация (1137066), страница 4
Текст из файла (страница 4)
If this indicator is higher for a certain client, it is made of loan granting,otherwise the client will be denied.Advanced analytics, including methods of applied statistics, machine learningand prediction, allows to obtain a deeper understanding of the characteristics of theborrower and the related risks, and to assess which characteristics can lead to futurearrears, defaults and «bad» debts. Risk assessment is carried out for cases:1. Lack of credit history (Application credit scoring)2. Presence of credit history (Behavior credit scoring)3.
Periodic risk assessment of loan portfolio144. The assessment of the probability of collecting overdue account receivable(Collection scoring)There are a number of further research directions, such as the question of applicability of the method "formal concept analysis" (FCA method) for solving tasks ofcredit risk management in banks, as well as in tasks of bank marketing.2.4Loan Default Prediction in Banking: ScorecardsBanks and credit institutions face classification problem each time they considera loan application. In most general case, bank aims to have a tool to discriminatebetween solvent and potentially delinquent borrowers, i.e.
the tool to predict whetherthe applicant is going to meet his or her obligations or not. Before 1950s such decision making process was expert driven and involved no explicit statistical modeling.The decision whether to grant a loan or not was made upon an interview and afterretrieving information about spouse and close relatives [54]. From the 1960s, bankshave started to adopt statistical scoring systems that were trained on datasets of applicants, consisting of their socio-demographic factors and loan application features.Typical scorecard is built in several steps.
The first step is so-called WOEtransformation [67], which transforms all numerical and categorical variables intodiscrete numerical variables. For continuous variables the procedure, in effect, breaksthe initial variable into several ranges, for categorical ones – the procedure regroupsthe initial categories. The second step is single factor analysis, when significant attributes are selected.
The commonly used feature selection method is then based oneither information value, or Gini coefficient calculation [67]. With the most predictive factors included into the model, they are further checked for pairwise correlations and multicollinearity. Features with high observed correlation are excluded. Assoon as single-factor analysis is over, logistic regression is run taking the selectedtransformed attributes as input.
The product of beta-coefficient and WOE value ofthe particular category produces the score for that particular category. The sum ofvariable scores produces the final score for the loan application. Finally, the cutoffscore is selected based on the revenue and loss in the historical portfolio. When the15scorecard is launched into work, the loan application immediately receives its scorewhich is compared to the cutoff point. In case the score is lower than cutoff value, theapplication is rejected, otherwise it is approved. It has to be mentioned that despiteits simple mathematical approach scorecards were incredibly attractive for lendinginstitutions for several reasons. First of all, new loan application received score foreach of its attributes, which provided clarity: in case of rejection the reason, why thefinal score was lower than cutoff, can be retrieved.
Moreover, the computation itselfwas not sophisticated and allowed developers to produce models of high out-of-timestability. The discriminative power of the models, however, is still at the moderatelevel. The Gini coefficient for the application scorecards varies from 45% to 55%,and for the behavioral scorecards the range is from 60% to 70% [71].The event of default in retail banking is defined as more than 90 days of delinquency within the first 12 months after the loan origination. Defaults are divided intofraudulent cases and ordinary defaults. The default is told to be a fraudulent casewhen delinquency starts at one of the three first months.
It means that when submitting a credit application, the borrower did not even intend to pay back. Otherwise, thedefault is ordinary when the delinquency starts after the first three months on book.That is why scorecards are usually divided into fraud and application scorecards. Infact the only difference is the target variable definition, while the sets of predictorsand the data mining techniques remain the same. The default cases are said to be“bad”, and the non-default cases are said to be “good”. Banks and credit organizations have been traditionally using scorecards to predict whether a loan applicant isgoing to be bad or good.Mathematical architecture of scorecards is based on a logistic regression, whichtakes the transformed variables as an input.
The transformation of the initial variablesis known as WOE-transformation [67]. It is wide-spread in credit scoring to applysuch a transformation to the input variables as soon as it accounts for non-lineardependencies and provides certain robustness coping with potential outliers. The aimof the transformation is to divide each variable into no more than k categories.
Atstep 0, all the continuous variables are binned into 20 quantiles, the nominal andordinal variables are either left untouched or are one-hot encoded. Now, when all the16variables are categorized, the odds ratio is computed for each category.oddsij =%goodsij%badsijThen for each predictor variable Xi (i = 1...n) non-significant categories are merged.Significance is measured by standard chi-square test for differences in odds with pvalue threshold up to 10%. So, for each feature the following steps are done:1. If Xi has 1 category only, stop and set the adjusted p-value to be 1.2.
If Xi has k categories, go to step 7.3. Else, find the allowable pair of categories of (an allowable pair of categories forordinal predictor is two adjacent categories, and for nominal predictor is anytwo categories) that is least significantly different (i.e. most similar) in termsof odds. The most similar pair is the pair whose test statistic gives the largestp-value with respect to the dependent variable Y.4. For the pair having the largest p-value, check if its p-value is larger than auser-specified alpha-level merge. If it does, this pair is merged into a singlecompound category.
Then a new set of categories of is formed. If it does not,then if the number of categories is less or equal to user-specified minimumsegment size, go to step 6, else merge two categories with highest p-value.5. Go to step 2.6. (Optional) Any category having too few observations (as compared with a userspecified minimum segment size) is merged with the most similar other category as measured by the largest of the p-values.7. The adjusted p-value is computed for the merged categories by applying Bonferroni adjustments [68].
Having accomplished the merging steps, we acquirecategorized variables instead of the continuous ones.17When each variable Xi (i = 1...n) is binned into a certain number of categories(ki ), one is able to calculate the odds for each category j (j = 1...ki ), the weight ofevidence for each category.W OEij = ln(oddsij )The role of the WOE-transformation is that, instead of initial variables, logistic regression receives WOE features as input. So, each input variable is a discrete transformed variable, which takes values of WOE.
When estimating the logistic regression, the usual maximum likelihood is applied.Logistic regression based on WOE-transformed variables is called a scorecardand has deserved to become the most wide-spread tool for binary classification problems in banking due to its simplicity, interpretability and sound accuracy. Throughoutthis thesis we will use it as the key benchmark along with common machine learningalgorithms.3Formal Concept Analysis in Classification ProblemThe methods of PD estimation can either produce so-called “black box” modelswith limited interpretability of model result, or, on the contrary, provide interpretableresults and clear model structure.
The key feature of risk management practice is that,regardless of the model accuracy, it must not be a black box. That is why methodssuch as neural networks and SVM classifiers did not earn much trust within banking community [83]. The dividing hyperplane in an artificial high-dimensional space(dependent on the chosen kernel) cannot be easily interpreted in order to claim thereject reason for the client.
As far as neural networks are concerned, they also donot provide the user with a set of reasons why a particular loan application has beenapproved or rejected. In other words, these algorithms do not provide the decisionmaker with knowledge. The predicted class is generated, but no knowledge is retrieved from data.On the contrary, alternative methods such as associative rules and decision trees18provide the user with easily interpretable rules which can be applied to the loan application.The topic of rule mining was studied and developed within works by B.
Liu et.al [9], [10], [11]. Classification rule mining aims to discover a small set of rules in thedatabase that produces an accurate classification. Association rule mining finds allthe rules existing in the database that satisfy some minimum support and minimumconfidence constraints.For association rule mining, the target of discovery is not pre-defined, while forclassification rule mining there is one and only one predetermined target variable.Also in [9] it is proposed to integrate two above-mentioned mining techniques. Theintegration is done by focusing on mining a special subset of association rules, calledclass association rules (CARs).
It was shown that the classifier built this way was, ingeneral, more accurate than that produced by decision trees. The above-mentionedgroup of algorithms is interpretable and in its aim similar to FCA based approach.However, there is difference in how we approach the numerical data. In [9] it isstated that the numerical data is discretized before performing rules mining. Withinour approach, due to interval pattern structures, the numerical data is kept in its initialstate, and rules induction is done having full information preserved. Also, the experimental results were obtained using general purpose UCI datasets, while we are goingto focus on credit scoring data. As soon as we are interested in algorithms proposedby B.Liu et.