Summary (1137065), страница 3

Файл №1137065 Summary (Рандомизированные алгоритмы на основе интервальных узорных структур) 3 страницаSummary (1137065) страница 32019-05-202019-05-20СтудИзба

Рандомизированные алгоритмы на основе интервальных узорных структур

Просмтор этого файла доступен только зарегистрированным пользователям. Но у нас супер быстрая регистрация: достаточно только электронной почты!

Регистрация/авторизация

Текст из файла (страница 3)

the target variable is distributed continuously). In order tomake FCA techniques applicable to this case new definitions of augmented intervalpattern structure is given.An augmented interval pattern structure is a quadruplet: (, , , ℎ), where is a set of objects, is a set of possible object descriptions, ∈ , and ⊓ is a meetoperator. Description d in credit scoring domain is a tuple which consist of twoelements dx and dy (dy – is an interval for target attribute ∈ , аnd dx – a tuple ofintervals for explanatory attributes x, which are supposed to predict target attributey).Let there be a mapping δ: G → D and additionally empirical distributionfunction h ∈ H, where H is a density functions family for target attribute. We willalso use notation δx и δy, to distinguish between descriptions containing explanatoryattributes and target attribute correspondingly.

The meet operator ⊓ definition is leftunchanged.Suppose, we have an arbitrary set of objects A0 ∈ G, i.e.:0 = {1 , 2 , … , },( ) = ( , ) = ([1 ; 1 ], … , [ ; ], [ ; ])for = 1, … , ,where J is a number of explanatory attributes. Then we define derivationoperator ⋄ the following way:⋄0 = (0 , ℎ0 )where 0 = {0 , 0 } and 0 = (1 ) ⊓ … ⊓ ( ), and target attributedescription is 0 = (1 ) ⊓ … ⊓ ( ), which is in fact a single interval [ymin,ymax], аnd h0 is mapping 0 → [0; 1] , i.e.

empirical density distribution function oftarget attribute values in A0:∑∈ ⥠[−1, )⊑()ℎ([−1 , )) =, ∀ = 1, … , ||−where 0 = , = ,and ∆ = − −1 = ,⥠ is indicator function.We will use the composition of derivation operator ⋄ in a similar way, it wasused with interval pattern structures, however it will return the image for description0 whatever target description 0 and density function ℎ are:⋄⋄⋄⋄0 = (0 , ℎ0 ) ≝ 0 = 1In order to approach target attribute prediction problem it will be useful to define αweak premise with ω-allowed dropout. Augmented interval pattern structure = ( , ) ∈ is called an α-weak premise with ω-allowed dropout, iff:1−|{ ∈ | − ( − ) ≤ () ≤ + ( − )}|≤||where () is a value of target attribute for object ,and = ⋄ , is an interval [ ; ] for target attribute, and m is a median ofempirical density distribution function h that describes target attribute values withininterval for objects from .Below we provide an example to understand how new definitions work.Let object set be = {1 , 2 , 3 } and description space consists of twoexplanatory attributes 1 , 2 and one target attribute :Objects\Attributes1231303531.52101211.50.50.70.8Let 0 = {1 , 2 }.Then (1 ) = ([30; 30], [10; 10]), (1 ) = [0.5; 0.5] (2 ) = ([35; 35], [12; 12]), (2 ) = [0.7; 0.7]0 = (0 , 0 )0 = (1 ) ⊓ (2 ) = ([30; 35], [10; 12])0 = (1 ) ⊓ (2 ) = [0.5; 0.7]ℎ0 = {0.5,0.7}⋄0 = (0 , ℎ0 )⋄⋄⋄⋄0 = (0 , ℎ0 ) = 0 = 1 = {1 , 2 , 3 }1 = ([30; 35], [10; 12], [0.5; 0.8])ℎ1 = {0.5,0.7,0.8}⋄⋄⋄⋄0 = 1 = (1 , ℎ1 )Description 0 = ([30; 35], [10; 12], [0.5; 0.7]) is a 1/3-weak descriptionwith 1-allowed dropout, as soon as median from 0.5 and 0.7 equals 0.6.The first stage of Query Based Regression Algorithm (QBRA) is mining αweak premises with allowed ω-dropout, the second is to perform prediction for testobject g t based on the mined premises.

Subsample size is a hyperparameter which isthe number of objects being randomly extracted from G. Then α and ωhyperparameters are specified. They control for anti-support in terms of bothfrequency and magnitude. After objects 0 = {1 , . . . , } are randomly extractedone calculates following pattern 0 = (1 ) ⊓ … ⊓ ( ) ⊓ ( ) and densitydistribution function ℎ0 for target attribute values. If 0 is an α - weak premise withallowed ω-dropout then it is added to the collection of premises that will be used forprediction. After premises mining, the next stage which is building up a predictionfor target attribute based on mined premises.

The resulting prediction was defined asa median of mixture of distributions from all premises.To test the algorithm, we used financial data from balance sheets and profitand loss statements of 612 corporate clients from the top-10 Russian bank. Amongothers factors we used assets-to-liabilities ratio, debt-to-equity ratio, earnings beforetaxes and interest payments, return on assets etc. These clients were assessed at thetime of early insolvency signals and the resulting recovery rate was collected.The accuracy of predictions was evaluated in terms of mean absolutedeviation (MAD):where is a target attribute (recovery rate) for i-th client in the test set and ̂is predicted value.

The algorithm was benchmarked with random forest model.MAD distribution shows that lazy algorithm allows one to obtain predictionerror relatively lower than the one with random forest tunings.Distributions represent accuracy achieved for a large number of algorithmruns with unique combination of hyperparameter values.Other benchmarks are provided below:The conclusion summarizes and focuses on that the key feature of riskmanagement practice is, regardless of the model accuracy, it must keepinterpretability.

Formal concept analysis offers attractive instruments to extractknowledge from data as soon as intents of concepts can be considered as associativerules. FCA-based algorithms are suitable for predictive modeling in areas wheremodel interpretation clarity is of great priority.

Also, the results show that theserandomized modifications for classification and regression tasks outperform classicalmethods used in banks such as scorecards and decision trees in terms of Gini andmean absolute deviation. Therefore, it is argued that proposed FCA-basedclassification and regression algorithms can compete with ordinary statisticalinstruments adopted in banks and still provide the sets of rules which were relevantfor loan applicants.In Appendix programming code both for QBCA and MLRA is provided.Some key functions for meet operator, intent and extent calculation, premises miningand final predictions are provided. The language it is provided with is R(https://www.r-project.org/) as soon as it has intuitive syntax and vectorizedlanguage, so that the reader grasps the idea behind the algorithm realizations.However, for production implementations different languages are recommendedsuch as Java or Spark (for distributed systems).Results Summary1.2.3.4.5.6.7.A randomized FCA-based algorithm for classification rules mining isdeveloped.The concept of α-weak premise and other parameters of the algorithm areintroduced.

Prediction accuracy analysis is performed with regard toalgorithm hyperparameters values tuned on credit scoring data.An algorithm is developed that allows to use the device of interval patternstructures in the problem of regression with several hyperparameters: thenumber of iterations, the alpha-threshold, the size of the subsample size, theomega - ω-dropout, penalty for a high variance value of the target variable onthe right side of the expanded pattern (penalty for high deviation).The concept of an augmented interval pattern structure is introduced, theconcept of an ω-dropout for α-weak descriptions is introduced. Newdefinitions help to solve regression problem via formal concepts analysismethods.Proposed algorithms interpretability is analyzed from the standpoint of creditdecision maker.

The accuracy of the algorithms is compared with the modelsof credit scoring and other benchmarks (both “white-box” and “black-box”).Query-based classification algorithm was developed with threehyperparameters: number of iterations, alpha-threshold, subsample size.Accuracy analysis of the algorithm predictions depending on thehyperparameters values was performed, an intuitive explanation is given forthe results obtained.The developed algorithms were implemented as program code in R language..

Характеристики

Тип файла

PDF-файл

Размер

870,98 Kb

Материал

Рандомизированные алгоритмы на основе интервальных узорных структур

Тип материала

Кандидатская диссертация

Предмет

Технические науки

Высшее учебное заведение

НИУ ВШЭ

Список файлов диссертации

randomizirovannye-algoritmy-na-osnove-intervalnyh-uzornyh-struktur.rar

Рандомизированные алгоритмы на основе интервальных узорных структур

Поделитесь ссылкой:

Ставлю 10/10
Все нравится, очень удобный сайт, помогает в учебе. Кроме этого, можно заработать самому, выставляя готовые учебные материалы на продажу здесь. Рейтинги и отзывы на преподавателей очень помогают сориентироваться в начале нового семестра. Спасибо за такую функцию. Ставлю максимальную оценку.

Лучшая платформа для успешной сдачи сессии
Познакомился со СтудИзбой благодаря своему другу, очень нравится интерфейс, количество доступных файлов, цена, в общем, все прекрасно. Даже сам продаю какие-то свои работы.

Студизба ван лав ❤
Очень офигенный сайт для студентов. Много полезных учебных материалов. Пользуюсь студизбой с октября 2021 года. Серьёзных нареканий нет. Хотелось бы, что бы ввели подписочную модель и сделали материалы дешевле 300 рублей в рамках подписки бесплатными.

Отличный сайт
Лично меня всё устраивает - и покупка, и продажа; и цены, и возможность предпросмотра куска файла, и обилие бесплатных файлов (в подборках по авторам, читай, ВУЗам и факультетам). Есть определённые баги, но всё решаемо, да и администраторы реагируют в течение суток.

Маленький отзыв о большом помощнике!
Студизба спасает в те моменты, когда сроки горят, а работ накопилось достаточно. Довольно удобный сайт с простой навигацией и огромным количеством материалов.

Студ. Изба как крупнейший сборник работ для студентов
Тут дофига бывает всего полезного. Печально, что бывают предметы по которым даже одного бесплатного решения нет, но это скорее вопрос к студентам. В остальном всё здорово.

Спасательный островок
Если уже не успеваешь разобраться или застрял на каком-то задание поможет тебе быстро и недорого решить твою проблему.

Всё и так отлично
Всё очень удобно. Особенно круто, что есть система бонусов и можно выводить остатки денег. Очень много качественных бесплатных файлов.

Отзыв о системе "Студизба"
Отличная платформа для распространения работ, востребованных студентами. Хорошо налаженная и качественная работа сайта, огромная база заданий и аудитория.

Отличный помощник
Отличный сайт с кучей полезных файлов, позволяющий найти много методичек / учебников / отзывов о вузах и преподователях.

Отлично помогает студентам в любой момент для решения трудных и незамедлительных задач
Хотелось бы больше конкретной информации о преподавателях. А так в принципе хороший сайт, всегда им пользуюсь и ни разу не было желания прекратить. Хороший сайт для помощи студентам, удобный и приятный интерфейс. Из недостатков можно выделить только отсутствия небольшого количества файлов.

Спасибо за шикарный сайт
Великолепный сайт на котором студент за не большие деньги может найти помощь с дз, проектами курсовыми, лабораторными, а также узнать отзывы на преподавателей и бесплатно скачать пособия.