Диссертация (1137066), страница 11

Файл №1137066 Диссертация (Рандомизированные алгоритмы на основе интервальных узорных структур) 11 страницаДиссертация (1137066) страница 112019-05-202019-05-20СтудИзба

Рандомизированные алгоритмы на основе интервальных узорных структур

Просмтор этого файла доступен только зарегистрированным пользователям. Но у нас супер быстрая регистрация: достаточно только электронной почты!

Регистрация/авторизация

Текст из файла (страница 11)

Also,we can elaborate that p(u) ∼ p(<) · |M | and p() ∼ p(<) · |G| |M | where G is61Figure 24. Average decision time required for 1 object62a set of objects in a dataset and M is its set of attributes (features), p(<) is timeneeded to perform comparison between two real numbers. One can note, that p(u)is small compared to p() as soon as |M | << |G|. Finally, complexity estimate isO(n · p(<) · |M | · (|G| + s − 1)).6344.1FCA in regression problemProblem descriptionPattern structures are known to provide a tool for predictive modeling such asclassification problems.

However, in order to produce classification rules, i.e. prediction, a concept lattice should be built. For non-binary data this procedure may takemuch time and resources. In the previous section it was shown that it is possible toescape the problem with so-called lazy associative classification algorithm. It doesnot require lattice construction and it is applicable to classification problems suchas credit scoring. Classifying credit applicants into good and potentially delinquentclients is the first part of credit risk assessment. The second is to estimate recoveryrate in case of default, i.e.

the proportion of the loan that is going to be collectedby the bank [75]. As far as recovery rates prediction is concerned, it implies continuous target variable. In this section, we will adopt the Query-Based ClassificationAlgorithm based on interval pattern structures to the case of continuous target variable (regression problem), i.e. we will introduce Query-Based regression algorithm(QBRA) and apply it to recovery rates forecasting. We perform parameters tuning,assess the accuracy of the algorithm based on the bank data and compare it to themodels adopted in the bank system and other benchmarks.4.2Augmented interval pattern structuresFor that case when the target attribute is not a class label but a continuous vari-able we adjust the interval pattern structure definition by equipping it with additionalobject h.Let us define an augmented interval pattern structure as a quadruplet (G, D ,δ,h), where the description d consists of two elements dx and dy (dy is an interval fortarget attribute y ∈ R and dx is a vector of intervals for explanatory attributes x whichare supposed to predict the target attribute y), δ : G → D and h ∈ H, where H isR +∞a family of density distribution functions for target attribute y, i.e.

−∞ h(s)ds = 1.We will also use notation δx and δy to distinguish between descriptions containing64explanatory attributes and target attribute correspondingly. The meet operation definition is left unchanged.Suppose, we have an arbitrary set of objects A0 ⊆ G, i.e. A0 = {g1 , g2 , ..., gJ },δ(gj ) = {δx , δy } = {[x1j ; x1j ], ..., [xM j ; xM j ], [yj ; yj ]}, for j = 1, ..., J, where M isnumber of explanatory attributes. Then we define the derivation operator the following wayA0 = (d0 , h0 )where d0 = {dx0 , dy0 }, and dx0 = δx (g1 ) u ...

u δx (gJ ) and target attribute descriptiondy0 = δy (g1 ) u ... u δy (gJ ) which is in fact a single interval [ymin , ymax ] and h0 :dy0 → [0; 1]. The h0 is in effect a target attribute density distribution function basedon observations of A0 , which we describe below. Let τ0 , ..., τK be a partition of dy0−ymin= τi − τi−1 , i = 1, ..., K. Then:and τ0 = ymin , τK = ymax and ∆τi = ymaxKPg∈A 1[ τi−1 , τi ) v δy (g)h([τi−1 , τi )) =, ∀i = 1, ..., K|A|Thus, h is a function of target attribute y values of objects in A. We will use the second derivation operator in a similar way it was used with interval pattern structures,however it will return the image for the description dx0 whatever target descriptiondy0 and density function h are:A0 = (d0 , h0 ) = dx0 = A1defwhere A1 = {g ⊆ G|dx0 v δx (g)}.

Generally speaking, A0 ⊆ A1 . Finally, A1 =(d1 , h1 ). Note, that d1 = {dx0 , dy1 }, i.e. only target attribute description dy is updated,so does h density function, while the explanatory variables description dx0 stays thesame.In order to approach target attribute prediction problem it will be useful to defineα-weak premises with allowed dropout. An h-augmented interval pattern d ∈ D iscalled an α-weak premise with allowed ω-dropout iff:Pg∈A (1− 1dmin−ω(dmin−m)≤δy (g)≤dmax+ω(m−dmax))yyyy|A|65≤αmax] for target attribute y A = dx ,where d = (dx , dy ), dy is a single interval [dminy ; dyand m is a median of density function h which reflects the distribution of targetattribute within the interval dy based on objects from A. This definition serves as ansubstitution for hypothesis in FCA context, but incorporates two degrees of freedomα and ω.

The first one is the parameter that controls the frequency of hypothesisfalsifications and the second one controls the magnitude of falsification, i.e. howdramatically it is falsified. In our case the magnitude is evaluated as the times theδy (g) − dmaxis larger than dmax− m if δy (g) > dmaxor the times the dmin− δy (g)yyyyis larger than dmin− m if δy (g) > dmaxNote, that in case when ω = 0 we apply theyystrictest criterion to consider a hypothesis as falsified:PPminmax(1−1)dy ≤δy (g)≤dyg∈Ag∈A (1 − 1dy vδy (g) )≤α⇔≤α⇔|A||A|Pg∈A 1dy 6vδy (g)⇔≤α|A|4.3Query-based regression algorithm with continuous target attributeAssume we have a set of objects G and numerical data with a section of explana-tory attributes x1 , ..., xM and target attribute y. In contrast to classification problemthe objects are not divided into sets of positive and negative examples as soon as yis continuous.

Now, suppose we receive a test object gt with observable attributes x,but with unknown value of target attribute y. Is there a way to predict y using intervalpattern structures approach? Indeed, there is, and we are going to describe it belowand compare the accuracy results with some benchmarks.The set of objects G would be referred as knowledge base rather than a trainingset, because we are eager to escape lattice construction which can be NP-hard ingeneral case.The first stage of algorithm is mining α-weak premises with allowed ω-dropout,the second is to perform prediction for test object gt based on the mined premises.Let us start by choosing subsample size parameter which is the number of objects66being randomly extracted from our knowledge base G.

Then we specify α and ωparameters that control for anti-support in terms of both frequency and magnitude.After we randomly extracted some objects A0 = {g1 , ..., gK } we calculate followingpattern d0 = δ(g1 ) u ... u δ(gK ) u δ(gt ) and density distribution function h0 for targetattribute values. If d0 is an α - weak premise with allowed ω-dropout then it is addedto the collection of premises that will be used for prediction later. Together with thepattern it is necessary to store the density function h.

But which of h0 , h1 or other wehave to use?Here we introduce another parameter of the algorithm which is called "capped".Capped is a Boolean value, and if true then the range for target attribute dy1 in d0 istruncated to dy0 and corresponding density function is h1 calculated on the truncatedset of target values.

If capped parameter is false, then we add dy1 and calculate thedensity function based on all target values that fell into dy1 based on objects from d0 .The whole procedure is repeated many times and the number of iterations parametercontrols for that.Having finished with premises mining, we move on to the next stage which isbuilding up a prediction for target attribute based on mined premises.

In our case,the resulting prediction was defined by mixture of distributions from all premises. Inpractice all target attribute values stored within premises were put together to forma final distribution. Finally, we tried both an average and a median of that distribution as the prediction for target attribute. Such approach takes into account differentsupport of the premises as soon as premises with greater number of objects will contribute more.However, one can argue that premises are different in sense of anti-support anddeviation in target attribute values. Indeed, we would put more weight to the prediction based on premises with narrow range of target attribute values and the ones withless contradicting examples from knowledge base G.

Therefore, we added target values to the final distributions with different weights, thus both weighted average andweighted median were used as forecast.We introduced two Boolean parameters which controlled the weighting schemes.The first parameter is account for anti-support and the second is penalty for high de67viation. When account for anti-support parameter is true, then the target values δy (g)of objects g ∈ A with the premise d are given weight according to the anti-support ofthat premise:Pwa = (1 −g∈A (1− 1dmin−ω(dmin−m)≤δy (g)≤dmax+ω(m−dmax))yyyy|A|)When penalty for high deviation is true, then the weight is decreased with the higherdeviation in the target attribute values:1σ(δy (g))wp =If the parameters values are false then the weights are equal to one.

The final weightfor the target attribute value of the object g, which will be contributed to aggregatedistribution used for prediction, is defined as product of the two weights:w(g) = wa · wpFinally, suppose that P is a set of mined α-weak premises with allowed ω-dropout.The prediction for target attribute y of a test object gt can be based on weightedaverage:Pδ\y (gt ) =Pp∈Pg∈Ap δy (g)Pp∈PPg∈Ap· w(g)w(g)or on the weighted median:δ\y (gt ) = median(g∈∪p Ap[ [(δy (g), w(g))p∈P g∈ApIn case when P is an empty set, the prediction is average or median of all targetattribute values in the knowledge base, i.e. the prediction is based on "naive" model.4.4Data and experimentsThe data we used for the computation represent a pool of delinquent corporateclients loans, which were expected to be restructured.

The process of restructuringis started at the early stage when the client shows the first signs of insolvency. At68that very moment a bank chooses either to execute default strategy, when the courtprocesses are launched and any disposable collateral is displayed for sale, or to execute restructuring strategy, when the funding conditions are being revisited usuallyresulting in a longer credit period. In case of corporate clients banks usually do notwant to go to extremes right from the start as soon as court launch and collateral salesimply costs and spending time resources.

Характеристики

Тип файла

PDF-файл

Размер

3,66 Mb

Материал

Рандомизированные алгоритмы на основе интервальных узорных структур

Тип материала

Кандидатская диссертация

Предмет

Технические науки

Высшее учебное заведение

НИУ ВШЭ

Список файлов диссертации

randomizirovannye-algoritmy-na-osnove-intervalnyh-uzornyh-struktur.rar

Рандомизированные алгоритмы на основе интервальных узорных структур

Поделитесь ссылкой:

Ставлю 10/10
Все нравится, очень удобный сайт, помогает в учебе. Кроме этого, можно заработать самому, выставляя готовые учебные материалы на продажу здесь. Рейтинги и отзывы на преподавателей очень помогают сориентироваться в начале нового семестра. Спасибо за такую функцию. Ставлю максимальную оценку.

Лучшая платформа для успешной сдачи сессии
Познакомился со СтудИзбой благодаря своему другу, очень нравится интерфейс, количество доступных файлов, цена, в общем, все прекрасно. Даже сам продаю какие-то свои работы.

Студизба ван лав ❤
Очень офигенный сайт для студентов. Много полезных учебных материалов. Пользуюсь студизбой с октября 2021 года. Серьёзных нареканий нет. Хотелось бы, что бы ввели подписочную модель и сделали материалы дешевле 300 рублей в рамках подписки бесплатными.

Отличный сайт
Лично меня всё устраивает - и покупка, и продажа; и цены, и возможность предпросмотра куска файла, и обилие бесплатных файлов (в подборках по авторам, читай, ВУЗам и факультетам). Есть определённые баги, но всё решаемо, да и администраторы реагируют в течение суток.

Маленький отзыв о большом помощнике!
Студизба спасает в те моменты, когда сроки горят, а работ накопилось достаточно. Довольно удобный сайт с простой навигацией и огромным количеством материалов.

Студ. Изба как крупнейший сборник работ для студентов
Тут дофига бывает всего полезного. Печально, что бывают предметы по которым даже одного бесплатного решения нет, но это скорее вопрос к студентам. В остальном всё здорово.

Спасательный островок
Если уже не успеваешь разобраться или застрял на каком-то задание поможет тебе быстро и недорого решить твою проблему.

Всё и так отлично
Всё очень удобно. Особенно круто, что есть система бонусов и можно выводить остатки денег. Очень много качественных бесплатных файлов.

Отзыв о системе "Студизба"
Отличная платформа для распространения работ, востребованных студентами. Хорошо налаженная и качественная работа сайта, огромная база заданий и аудитория.

Отличный помощник
Отличный сайт с кучей полезных файлов, позволяющий найти много методичек / учебников / отзывов о вузах и преподователях.

Отлично помогает студентам в любой момент для решения трудных и незамедлительных задач
Хотелось бы больше конкретной информации о преподавателях. А так в принципе хороший сайт, всегда им пользуюсь и ни разу не было желания прекратить. Хороший сайт для помощи студентам, удобный и приятный интерфейс. Из недостатков можно выделить только отсутствия небольшого количества файлов.

Спасибо за шикарный сайт
Великолепный сайт на котором студент за не большие деньги может найти помощь с дз, проектами курсовыми, лабораторными, а также узнать отзывы на преподавателей и бесплатно скачать пособия.