Summary (1137065), страница 2

Файл №1137065 Summary (Рандомизированные алгоритмы на основе интервальных узорных структур) 2 страницаSummary (1137065) страница 22019-05-202019-05-20СтудИзба

Рандомизированные алгоритмы на основе интервальных узорных структур

Просмтор этого файла доступен только зарегистрированным пользователям. Но у нас супер быстрая регистрация: достаточно только электронной почты!

Регистрация/авторизация

Текст из файла (страница 2)

It covers the key role modeling is playing in risk management andreviews widespread statistical algorithms used for classification and regression tasks.In context of credit risk assessment two parameters are emphasized: probability ofdefault (PD), loss given default (LGD). From data science standpoint PD estimationis a binary classification problem, and LGD estimation is regression problem.

Tradeoff between accuracy of prediction versus model interpretability is emphasized assoon as some regulators require banks to be able to provide reject reasons forborrowers and also when central banks examine the bank models they are willing tounderstand economic intuition behind them to prove the models are going to showexpected and stable performance.Loan default prediction with the use of scorecards is discussed as soon as thismethod is widely adopted within banking industry and is used as benchmark for“white-box” models thereafter.

The raw factors weight-of-evidence (WOE)transformation is designed to account for outliers and non-monotonous dependenciesin an adequate way before feeding data into logistic classifier.“Black-box” models are discussed with an example of neural networks whichare opposed to transparent models which provide user with understanding why thealgorithm predicts particular probabilities of default for client.The third section contains the first novelty: application of formal conceptanalysis (FCA) to classification problem with datasets of large number ofobservations.

Basic FCA definitions are provided (pattern structure, meet operator,derivation operator, pattern intent and extent). New definitions for α-weak premises3are provided.Suppose we have a set of positive examples + (objects of positive class) and aset of negative examples − (objects of negative class), + ∩ − = ∅, и + ⋃ − =. Let the description set is denoted by D, which consists of tuples with intervals asits elements, i.e. = {([1 ; 1 ], … , [, ; , ]) | ∀: , ∈ } , where K isdimensionality of attribute space. For example, for K=3 one can provide thefollowing element of D: d = ([1;2], [-0.5;0.3], [150;340]).3also known as classifiers, hypothesesLetusprovidemapping: → suchthatfor ∈ : () = ([1 ; 1 ], … , [ ; ]), i.e.

each object has its own description as apoint in K-dimensional real number space.Fortwodescriptions1 , 2 ∈ ,1 = ([1 ; 1 ], … , [K ; K ])],and 2 = ([1 ; 1 … , [K ; K ]) meet operation ⊓ is defined:1 ⊓ 2 = ([min(1 , 1 ); max(1 , 1 )], … , [min( , ); max( , )])If 1 ⊓ 2 = 1 , then it is denoted that 1 ⊑ 2Interval pattern structure is a triplet (, , ), где = (,⊓), i.e. a set ofobjects with a set of possible descriptions, meet operation ⊓ and a mapping .Also we define a mapping from set of objects G to description set D and viceversa, denoting it with ⋄ :⋄ =⊓∈ () for ⊆ , ⋄ = { ∈ | ⊑ ()} for ∈ .New definitions of α-weak premises are introduced.

Description + ∈ iscalled an -weak positive premise if:⋄ ∩ ||+−|− |≤ , and ∃ ⊆ + : + ⊑ ⋄Description − ∈ is called an -weak negative premise if:⋄ ∩ ||−+|+ |≤ , and ∃ ⊆ − : − ⊑ ⋄Query-based classification algorithm (“lazy classification”) is introduced. Thealgorithm takes set of positive and negative examples (+ and − ), set of test objects with corresponding descriptions and mapping as input. The output of thealgorithm is a real number ∆∈ assigned for each test object from ∈ .This number ∆ serves as a credit score and allows one to build cutoff decision rulessuch as “if ∆> then belongs to positive class”. The idea behind the algorithmis to check whether it is the set of positive or negative examples the test object ismore similar to.

The similarity is defined as a total support of α-weak positive(negative) premises that contain the description of test object. The support of an αweak positive description of + is called |+⋄ ⋂+ |, that is, the number of objectsfrom the set of positive examples + satisfying the description of + . The support ofan α-weak negative description of − is called |−⋄ ⋂− |, that is, number of objectsfrom the set of negative examples − satisfying the description of − .

Let there be pα-weak positive descriptions and n α-weak negative descriptions, all of them containthe description of the test object ( ) , i.e. ∀ = 1, … , : + ⊑ ( ) and∀ = 1, … , : − ⊑ ( ). The total support for α-weak positive descriptions is⋄ = ∑=1 |+⋂+ | , and the total support for α-weak negative descriptions is⋄ = ∑=1 |−⋂− |. Based on the value ∆= − , an estimation is made whetherthe test object is more similar to objects from a set of positive or negative examples;it serves as credit score for the borrower's creditability assessment. The paper alsoconsiders other similarity measures and voting schemes based on α-weakdescriptions (see section 3.4 of the dissertation).The algorithm is an iterative procedure and uses three hyperparameters:subsample size, number of iterations and alpha-threshold.

The first hyperparameteris a percentage of objects in a set of positive (negative) examples which arerandomly extracted within each iteration. At each iteration the subsample is extractedfrom − and + and objects descriptions in subsample are intersected (⊓) with thedescription of test object : = (1 ) ⊓ … ⊓ ( ) ⊓ ( )where ⁄|| = _ .The number of times (number of iterations) we randomly extract a subsamplefrom the set of examples is the second hyperparameter of the algorithm, which isalso tuned through grid search.

If is not α-weak premise then it is ignored, if isα-weak premise then is saved in order to be used in classification of the test objectlater.These steps are performed for each test object for positive and negative set ofexamples separately, producing a set of positive and negative α-weak premises. Thefinal output of the algorithm is a difference between the total support for α-weakpositive premises and the total support for α-weak negative premises for the testobject. Based on this output we calculate model quality metrics that is widely used incredit scoring – Gini coefficient.The algorithm is tested on both internal top-10 bank data and open Kaggledata. The positive set of examples is a set of loans where the target attribute ispresent. The target attribute in credit scoring is defined as more than 90 days ofdelinquency within the first 12 months after the loan origination.

Each set ofexamples consists of 1000 objects in order that voting scheme concerned in thesecond section was applicable. The test dataset consists of 300 objects and isextracted from the same population as the sets of positive and negative examples.Attributes represent various metrics such as loan amount, term, rate, payment-toincome ratio, age of the borrower, undocumented-to-documented income, credithistory metrics etc. The set of attributes used for the lazy classification trialscontained 28 numerical attributes. In order to evaluate the accuracy of theclassification Gini coefficient is calculated for every combination ofhyperparameters based on 300 predictions on the test set. Gini coefficient iscalculated based on the margin between the number of objects within positivepremises and negative ones.

The margin is considered as an measure similar to scorevalue in credit scorecards. Hyperparameter grid search is performed:Gini coefficients for the hyperparameters grid search (QBCA)Gini coefficients for the hyperparameters grid search on specified areaQuery-based classification algorithm versus classical models adopted in banks andother benchmarks for top-10 bank dataAs far as open data is concerned the algorithm was tested on Kaggle data of“Give Me Some Credit” contest held in 20124. The data has a binary target variable(class label) whether the borrower defaulted or not.Query-based classification algorithm versus benchmarks for Kaggle credit scoringopen dataset4https://www.kaggle.com/c/GiveMeSomeCreditApart from accuracy measures sensitivity analysis is performed andalgorithms properties are analyzed.Also, visualization of collections of α-weak premises are presented thatallows one to interpret the model outcome for the client.

In effect, when consideringa test object target class label (good or bad) the algorithms builds portraits of goodand bad clients on historical data in multi-dimensional feature space.Below are several examples of two-way areas are provided for different levelsof number of iterations:Positive premises are depicted in red and negative are in blue. To constructeach positive premise, two objects from the set of positive examples were randomlyextracted. Then the meet-operator was applied and a set of intervals was obtained.After that, only the intervals for two features were left. The same algorithm wasperformed for negative premises.It is argued that this set of areas in feature space gives the understanding ofwhy a particular borrower was considered of high or low credit risk by the model.The fourth section contains the second novelty: adoption of FCA toregression problem (i.e.

Характеристики

Тип файла

PDF-файл

Размер

870,98 Kb

Материал

Рандомизированные алгоритмы на основе интервальных узорных структур

Тип материала

Кандидатская диссертация

Предмет

Технические науки

Высшее учебное заведение

НИУ ВШЭ

Список файлов диссертации

randomizirovannye-algoritmy-na-osnove-intervalnyh-uzornyh-struktur.rar

Рандомизированные алгоритмы на основе интервальных узорных структур

Поделитесь ссылкой:

Ставлю 10/10
Все нравится, очень удобный сайт, помогает в учебе. Кроме этого, можно заработать самому, выставляя готовые учебные материалы на продажу здесь. Рейтинги и отзывы на преподавателей очень помогают сориентироваться в начале нового семестра. Спасибо за такую функцию. Ставлю максимальную оценку.

Лучшая платформа для успешной сдачи сессии
Познакомился со СтудИзбой благодаря своему другу, очень нравится интерфейс, количество доступных файлов, цена, в общем, все прекрасно. Даже сам продаю какие-то свои работы.

Студизба ван лав ❤
Очень офигенный сайт для студентов. Много полезных учебных материалов. Пользуюсь студизбой с октября 2021 года. Серьёзных нареканий нет. Хотелось бы, что бы ввели подписочную модель и сделали материалы дешевле 300 рублей в рамках подписки бесплатными.

Отличный сайт
Лично меня всё устраивает - и покупка, и продажа; и цены, и возможность предпросмотра куска файла, и обилие бесплатных файлов (в подборках по авторам, читай, ВУЗам и факультетам). Есть определённые баги, но всё решаемо, да и администраторы реагируют в течение суток.

Маленький отзыв о большом помощнике!
Студизба спасает в те моменты, когда сроки горят, а работ накопилось достаточно. Довольно удобный сайт с простой навигацией и огромным количеством материалов.

Студ. Изба как крупнейший сборник работ для студентов
Тут дофига бывает всего полезного. Печально, что бывают предметы по которым даже одного бесплатного решения нет, но это скорее вопрос к студентам. В остальном всё здорово.

Спасательный островок
Если уже не успеваешь разобраться или застрял на каком-то задание поможет тебе быстро и недорого решить твою проблему.

Всё и так отлично
Всё очень удобно. Особенно круто, что есть система бонусов и можно выводить остатки денег. Очень много качественных бесплатных файлов.

Отзыв о системе "Студизба"
Отличная платформа для распространения работ, востребованных студентами. Хорошо налаженная и качественная работа сайта, огромная база заданий и аудитория.

Отличный помощник
Отличный сайт с кучей полезных файлов, позволяющий найти много методичек / учебников / отзывов о вузах и преподователях.

Отлично помогает студентам в любой момент для решения трудных и незамедлительных задач
Хотелось бы больше конкретной информации о преподавателях. А так в принципе хороший сайт, всегда им пользуюсь и ни разу не было желания прекратить. Хороший сайт для помощи студентам, удобный и приятный интерфейс. Из недостатков можно выделить только отсутствия небольшого количества файлов.

Спасибо за шикарный сайт
Великолепный сайт на котором студент за не большие деньги может найти помощь с дз, проектами курсовыми, лабораторными, а также узнать отзывы на преподавателей и бесплатно скачать пособия.