Диссертация (1137066), страница 6

Файл №1137066 Диссертация (Рандомизированные алгоритмы на основе интервальных узорных структур) 6 страницаДиссертация (1137066) страница 62019-05-202019-05-20СтудИзба

Рандомизированные алгоритмы на основе интервальных узорных структур

Просмтор этого файла доступен только зарегистрированным пользователям. Но у нас супер быстрая регистрация: достаточно только электронной почты!

Регистрация/авторизация

Текст из файла (страница 6)

. u δ(gk ) u δ(g)) contains the object of the opposite class. In order tocontrol this issue, we add third hyperparameter which is α-threshold. If the percentage of objects from the set of positive (negative) examples that falsify the premiseδ(g1 ) u . . . u δ(gk ) u δ(g) is greater than alpha-threshold share of this set of examplessize then the premise will be considered as falsified, otherwise the premise will beα-weak and, thereafter, used in classification of the test object.The idea behind the algorithm is to check whether it is set of positive or set ofnegative examples that the test object is more similar to. The similarity is defined asa total support of α-weak positive (negative) premises that contain the description oftest object.

The support of an α-weak positive description of d+ is called |d+ ∩ G+ |,that is, the number of objects from the set of positive examples G+ satisfying thedescription of d+ . The support of an α-weak negative description of d− is called23|d− ∩ G− |, that is, number of objects from the set of negative examples G− satisfyingthe description of d− . Let there be p α-weak positive descriptions and n α-weaknegative descriptions, all of them contain the description of the test object δ(gtest ),i.e. ∀i = 1, . . . , p : d+i v δ(gtest )and∀j = 1, . . . , n : d−j v δ(gtest ).

The totalPpsupport for α-weak positive descriptions is P =i=1 |d+i ∩ G+ |, and the totalPsupport for α-weak negative descriptions is N = nj=1 |d−j ∩ G− |. Based on thevalue 4 = P − N , an estimation is made whether the test object is more similar toobjects from a set of positive or negative examples; it serves as credit score for theborrower’s creditworthiness assessment. The algorithm uses three hyperparameters:subsample size, number of iterations and alpha-threshold.The first hyperparameter is expressed as percentage of the observations in theset of positive (negative) examples. At each step the subsample is extracted and thedescriptions of the objects in subsample are intersected with the description of testobject.The number of times (i.e. number of iterations) we randomly extract a subsample from the set of positive (negative) examples is the second hyperparameter of thealgorithm, which is also tuned through grid search.

If the percentage of objects fromthe set of positive (negative) examples that falsify the premise δ(g1 )u. . .uδ(gk )uδ(g)is greater than α-threshold share of the set of examples then the premise will be considered as falsified, otherwise the premise will be α-weak and, thereafter, used inclassification of the test object.These steps are performed for each test object for the sets of positive and negative examples separately, producing a set of positive and negative α-weak premises.The final output for the test object we used was a difference between the total numberof objects from the set of positive examples supporting positive premises and the totalnumber of objects from the set of negative examples supporting negative premises.It has to be mentioned that similar idea of fuzzy threshold was introduced in[79].

The paper expands the JSM-method in case of fuzzy hypotheses. The authorintroduces a drop which is a combination of < f, n, m >, and to some extent servesas an α-weak description. The first element of a drop is f - a fragment - in termsof Grigoriev, that is an analog of what states for description in this dissertation. The24constants n and m define correspondingly the minimum required number of objectsfrom the set of positive examples that must satisfy the f -fragment, and the maximum allowed number of objects from the set of negative examples that can satisfythe f -fragment.

If fragment is falsified "rarely" (less than m counter examples fromopposite examples), then the f -fragment is told to be a "cause" for particular propertyof an object. Alpha-threshold regulates for the maximum acceptable percentage ofobjects in opposite examples, which can satisfy the description.

To continue with thisanalogy, α-weak description is considered to be a "cause" of the target property (default or non-default client). Also, [79] introduces Boolean function "suit", which issimilar to aggregating operator, which provides the voting over α-weak descriptions.To bring contrast between this research and the one in [79] it has to be mentioned that previous work does not provide any randomized approach, and thereforethe work is far more theoretical as soon as calculating drop is dramatically costlyfor memory and CPU. Therefore, in previous work they resort to assumption that thenumber of possible fragment is low. In our case, relatively large samples of data areused and the feature space dimension is also high, so, the previous assumption doesnot hold true.

Grigoriev uses traditional JSM-type reasoning where all hypothesesare generated at the first stage, whereas in our work only query-based hypotheses aregenerated, which reduces memory used and classification time for a single classification. Also, there is difference in attributes space: Grigoriev operates exclusivelywith binary attributes, while in our case classification problem is considered withininterval descriptions (numerical data) as soon as this is the typical case for credit riskassessment data.

In the next section further steps of proposed query-based classification algorithms are discussed.3.4Voting SchemesIn order to produce classification for a test object, we construct a voting schemeamong α-weak premises. In most general case voting scheme F is a mapping:+ −−F (gtest , h+1 , ..., hp , h1 , ..., hn ) → {−1, 1, ∅}25where gtest is the test object with unknown class, h+i is a positive α-weak premise∀i = 1, p and h−j is a negative α-weak premise ∀j = 1, n , -1 is a label for negativeclass, and 1 is a label for positive class (i.e.

defaulters). In other words, F is anaggregating rule that takes premises as input and gives the classification label as anoutput. Note, that we allow for an empty label. If the label is empty it is said thatthe voting rule abstains from classification. There may be different approaches tobuild up aggregating rules. The voting scheme is built upon weighting function ω(·),aggregation operator A(·) and comparing operator ⊗.F (ω(·), A(·), ⊗) =n−= (Api=1 [ω(h+i )]) ⊗ (Aj=1 [ω(hj )])In order to configure a new weighting scheme it is sufficient to define the operators and weighting function.

In this paper we use the number of positive versusnegative α-weak premises. In this case the rule allows the test object to satisfy bothpositive and negative premises which decreases the rejects from classification. Theweighting function, aggregation operator and comparing operator are defined as follows:A(h) =Xh1, if δ(gtest ) v hω(h) =0, otherwisesign(b − a), if a 6= ba⊗b=∅,a=bSo the label for a test object gn is defined by following mapping:+ −−F (gtest , h+1 , ..., hp , h1 , ..., hn ) =pnXX+= ( [δ(gtest ) v hi ]) ⊗ ( [δ(gtest ) v h−j ])i=1j=1However, one can think of margin b − a as a measure for discrimination betweentwo classes and consider the decision boundary based on ROC analysis, for instance.26This approach is good for decreasing the number of rejects from classification, butit does not account for the support of the premises.

Naturally, one would give moreweight to the premise with large image (with higher support). Also, if the number ofpositive and negative premises is equal the rule rejects from classification. Secondly,voting schemes can be varied in order to better account for evidence from data. Toelaborate on possible variations we will consider total number of objects within positive versus the total number of objects within negative premises can be considered.Then voting rule takes the following form:+ −−F (gtest , h+1 , ..., hp , h1 , ..., hn ) =pnXX + − (h ) ) ⊗ ((h ) )=(iji=1j=1Here, only the weighting function is changed in comparison to the previous scheme:ω(h) = |h |This approach accounts for the premise image power, but it can count the sameobjects that support the positive (negative) class several times as soon as these objectsappear in the images of several premises.Thirdly, the total number of unique objects within positive versus the total number of unique objects within negative premises can be considered.

In this case aggregation operator takes some features from the weighting function and the latter turnsinto a mapping:+ −−F (gtest , h+1 , ..., hp , h1 , ..., hn ) = pn [[+ − = unique[ (hi ) ] ⊗ unique[ (hj ) ] i=1j=1The A(·) operator is redefined over the images of the premises and the weightingfunction is redefined as Galois mapping.A(h ) = |unique[h ]|ω(h) = h27Further, the geometrical interpretation the feature space can be used. The premise(given the numerical data) represents, in effect, a multidimensional parallelepiped.The edges of this parallelepiped are intervals for the particular attribute. The moregeneric is the premise the wider are the intervals of the premise, thus, the greater isthe geometrical volume of the premise.

We are eager to favor and give more weightto premises with less volume as soon as they are more specific for the test object.Here we can find analogy with k-nearest neighbors when the test object is classifiedaccording its closest in some sense neighbors. The closer is the neighbor, the moreweight it has for the predicted class.+ −−F (gtest , h+1 , ..., hp , h1 , ..., hn ) =p YMMn YXX+h−h−h+hjii(bk − ak j ))=((bk − ak )) ⊗ (j=1 k=1i=1 k=1h+h+where M is the number of attributes, ak i , bk i are the left and the right border of thepremise interval correspondingly. Note, that the comparing operator is redefined andnegated in contrast to the previous cases:XA(h) =hω(h) =MY(bhk − ahk )k=1a⊗b=−sign(b − a),if a 6= b∅,otherwiseWeighting function is the geometrical volume of the premise. The comparing operator is negated because now the premises with greater volume are less importantin voting.

Характеристики

Тип файла

PDF-файл

Размер

3,66 Mb

Материал

Рандомизированные алгоритмы на основе интервальных узорных структур

Тип материала

Кандидатская диссертация

Предмет

Технические науки

Высшее учебное заведение

НИУ ВШЭ

Список файлов диссертации

randomizirovannye-algoritmy-na-osnove-intervalnyh-uzornyh-struktur.rar

Рандомизированные алгоритмы на основе интервальных узорных структур

Поделитесь ссылкой:

Ставлю 10/10
Все нравится, очень удобный сайт, помогает в учебе. Кроме этого, можно заработать самому, выставляя готовые учебные материалы на продажу здесь. Рейтинги и отзывы на преподавателей очень помогают сориентироваться в начале нового семестра. Спасибо за такую функцию. Ставлю максимальную оценку.

Лучшая платформа для успешной сдачи сессии
Познакомился со СтудИзбой благодаря своему другу, очень нравится интерфейс, количество доступных файлов, цена, в общем, все прекрасно. Даже сам продаю какие-то свои работы.

Студизба ван лав ❤
Очень офигенный сайт для студентов. Много полезных учебных материалов. Пользуюсь студизбой с октября 2021 года. Серьёзных нареканий нет. Хотелось бы, что бы ввели подписочную модель и сделали материалы дешевле 300 рублей в рамках подписки бесплатными.

Отличный сайт
Лично меня всё устраивает - и покупка, и продажа; и цены, и возможность предпросмотра куска файла, и обилие бесплатных файлов (в подборках по авторам, читай, ВУЗам и факультетам). Есть определённые баги, но всё решаемо, да и администраторы реагируют в течение суток.

Маленький отзыв о большом помощнике!
Студизба спасает в те моменты, когда сроки горят, а работ накопилось достаточно. Довольно удобный сайт с простой навигацией и огромным количеством материалов.

Студ. Изба как крупнейший сборник работ для студентов
Тут дофига бывает всего полезного. Печально, что бывают предметы по которым даже одного бесплатного решения нет, но это скорее вопрос к студентам. В остальном всё здорово.

Спасательный островок
Если уже не успеваешь разобраться или застрял на каком-то задание поможет тебе быстро и недорого решить твою проблему.

Всё и так отлично
Всё очень удобно. Особенно круто, что есть система бонусов и можно выводить остатки денег. Очень много качественных бесплатных файлов.

Отзыв о системе "Студизба"
Отличная платформа для распространения работ, востребованных студентами. Хорошо налаженная и качественная работа сайта, огромная база заданий и аудитория.

Отличный помощник
Отличный сайт с кучей полезных файлов, позволяющий найти много методичек / учебников / отзывов о вузах и преподователях.

Отлично помогает студентам в любой момент для решения трудных и незамедлительных задач
Хотелось бы больше конкретной информации о преподавателях. А так в принципе хороший сайт, всегда им пользуюсь и ни разу не было желания прекратить. Хороший сайт для помощи студентам, удобный и приятный интерфейс. Из недостатков можно выделить только отсутствия небольшого количества файлов.

Спасибо за шикарный сайт
Великолепный сайт на котором студент за не большие деньги может найти помощь с дз, проектами курсовыми, лабораторными, а также узнать отзывы на преподавателей и бесплатно скачать пособия.