Math (562419), страница 12

Файл №562419 Math (Несколько текстов для зачёта) 12 страницаMath (562419) страница 122015-12-042015-12-04СтудИзба

Просмтор этого файла доступен только зарегистрированным пользователям. Но у нас супер быстрая регистрация: достаточно только электронной почты!

Регистрация/авторизация

Текст из файла (страница 12)

"Consider two classifiers C1 and C2 having the same action, where C2's condition is a generalization of C1's. That is, C2's condition can be generated by C1's by changing one or more of C1's specified (1 or 0) alleles to don't cares (#). Suppose C1 and C2 have the same epsilon, and are thus equally accurate.

Every time C1 and C2 occur in the same action set, their fitness values will be updated by the same amount. However, because C2 is a generalization of C1 it will tend to occur in more match sets than C1, and thus probably (depending on the action-selection regime) in more action sets. Because the GA occurs in action sets, C2 will have more reproductive opportunities and thus its number of exemplars will tend to grow with respect to C1's [...]. Consequently, when C1 and C2 next meet in the same action set, a larger fraction of the constant fitness update would be "steered" toward exemplars of C2, resulting via the GA in yet more exemplars of C2 relative to C 1. Eventually, it was hypothesized, C2 would displace C1 from the population." (Wilson, 1995)

Wilson's hypothesis explains how XCS develops a tendency to evolve maximally general classifiers. But what happens when an overly general classifier appears in the population?

Overgeneral classifiers are such that, due to the presence of some don't care symbols, they match different niches with different rewards and thus will become inaccurate. Since the GA in XCS bases fitness upon classifier accuracy, overly general classifiers tend to reproduce less and will eventually be deleted.

In Section 6.2, we will analyze the generalization mechanism in detail, to show why it may sometimes work incorrectly.

6.2 Are Overgeneral Classifiers Inaccurate?

The generalization mechanism of XCS is sound so it is not clear why it may fail in certain environments. Lanzi (1997) observes that generalization in XCS is achieved through evolution; therefore, there may be cases in which the generalization mechanism can be too slow to delete overly general classifiers, and these have enough time to proliferate in the population.

We believe that Wilson's generalization hypothesis is correct; accordingly, we argue that XCS fails in learning a certain task when some terms of the hypothesis do not hold. First, we observe that:

For overly general classifiers to be "deleted", i.e., reproduce less and then be deleted, they must be observed by the system to be inaccurate. However, this happens only if overly general classifiers are applied in distinct environmental niches.

We argue that in XCS it is not always true that an overly general classifier will become inaccurate; in fact, due to the parameter update, a classifier becomes inaccurate only when it is applied to situations which have different payoff levels. However, this only happens when the classifier is applied in different situations, i.e., environmental niches. There are applications in which, due to the structure of the environment and to the exploration policy, the animat does not visit all the niches with the same frequency, but rather it stays in a certain area of the environment for a while and then moves to another one. In such situations, Wilson's generalization hypothesis may fail because overly general classifiers which should be inaccurate may be evaluated as accurate.

Consider for example an overly general classifier that matches two niches belonging to two different areas of the environment. As long as the system stays in the area belonging to the first niche, its parameters will be updated accordingly to the payoff level of the first niche. As long as the animat does not visit the second niche, the classifier appears accurate even if it is globally overly general.(n4) The overly general classifier is thus selected for reproduction and the system allocates resources, i.e., copies, to it. When the animat moves to the other area of the environment belonging to the second niche, the classifier starts becoming inaccurate because the payoff level that it predicts is no longer correct. At this point, two things may happen. First, perhaps the classifier did not reproduce sufficiently in the first niche; therefore, the (macro) classifier is deleted because it has become inaccurate: the animat thus "forgets" what it learned in the previous area. Second, if the overly general classifier reproduced sufficiently when in the initial niche, the (macro) classifier survives enough to adjust its parameters in order to become accurate with respect to the current niche. Therefore, the overly general classifier continues to reproduce and mutate in the new niche, and can produce even more overly general offspring. This behavior can be summarized as follows:

XCS usually learns a global policy. However, if the environment is not or cannot be visited frequently, it tends to learn a local policy that can produce overly general classifiers, which by definition cause performance errors.

Note that the phenomenon we discuss does not concern the general problem of having incomplete information about the environment caused by a partial exploration. The environments we use are small enough that, after the first two hundred problems, the system has tried almost all the possible environmental niches. Instead, our statement deals with the capability of XCS in evolving a stable solution. Thus our hypothesis states that:

XCS fails to learn an optimal policy in environments where the system is not very likely to explore all the environmental niches frequently.

This hypothesis concerns the capability of the agent to explore all of the environment in a uniform way; therefore it is related to the environment structure and to the exploration strategy employed. Since the exploration strategies previously employed within XCS in animat problems select actions randomly, our hypothesis is directly related to the average random walk to food. The smaller it is, the more likely the animat will be able to visit all positions in the environment frequently. The larger the average random walk, the more likely the animat is to visit certain areas of the environment more frequently. Our hypothesis, therefore, can explain why in certain environments XCS with biased exploration performs better than XCS with random exploration. When using biased exploration, the animat performs a random action only with a certain probability, otherwise it employs the best action. Accordingly, the animat is not likely to spend much time in a certain area of the environment but, following the best policy it learned, it moves to another area. When the environmental niches are more separated, such as in Maze6 and Woods14, the animat is unable to visit all the niches as frequently as would be necessary in order to evolve an optimal policy.

6.3 Discussion

We proposed a hypothesis in order to characterize the situations in which XCS may not converge to an optimal policy. The hypothesis we formulated concerns the concept of environmental niche and suggests that XCS can fail to converge to a global optimum if the environmental niches are not explored frequently. We thus observe that the system should not explore one area of the environment for a long time; instead, it should frequently change environmental niche. Otherwise, XCS may start to learn locally, evolving classifiers which are correct with respect to a specific area but are inaccurate in some other area.

Notice that our hypothesis is not a matter of the environment or of XCS alone but depends upon the interaction between them. An environment in which the animat is likely to visit all the possible areas will be easily solved by XCS with the usual random exploration strategy.

We want to point out that, although the approach we followed to study the behavior of XCS regards a specific kind of environments, i.e., grid-worlds, the conclusions we draw appear to be general and therefore can be extended to other environments.

7 Verification of the Hypothesis

According to the hypothesis presented in the previous section, XCS can fail to converge to the optimum in those environments where the system is not likely to explore all the environmental niches frequently. If our hypothesis is correct, the phenomena we have discussed should not appear when XCS employs an exploration strategy guaranteeing frequent exploration of all the environmental niches.

In this section we validate our hypothesis empirically. We introduce a meta-exploration strategy, teletransportation, that we use as a theoretical tool to verify our argument. The strategy can be applied to any exploration strategy previously employed with XCS. Accordingly, we refer to it as a meta-exploration strategy rather than an exploration strategy.

Teletransportation works as follows: when in exploration, the animat is placed randomly in a blank cell of the environment; then it moves following one of the possible exploration strategies proposed in the literature, random or biased. If the animat reaches a food cell within a maximum number M[sub es] of steps, the exploration ends; otherwise, if the animat does not find food by M[sub es] steps, it is moved, i.e., teletransported, to another blank cell and the exploration phase is restarted. Teletransportation guarantees for small M[sub es] values that the animat visits all the possible niches with the same frequency; while for large M[sub es] this strategy becomes equivalent to the exploration strategy employed without teletransportation, e.g., random or biased.

We apply XCS with teletransportation (XCST) to the environments previously discussed (Maze5, Maze6 and Woods14) using the same parameters settings employed in the original experiments. Figure 10 compares the performance of XCST and XCS with biased exploration in Maze5, when a population of 1600 classifiers is employed and the M[sub es] parameter is set to 20 steps. Results show that, in Maze5 XCST converges to the optimum. As Figure 11 shows, XCST's performance is stable near the optimum even when only 800 classifiers are employed in the population. We have similar results when XCST is applied to Maze6 (see Figure 12). The comparison of the performance for XCST and XCS shows that XCST converges to an optimal solution while XCS with biased exploration, for the same parameter settings, cannot reach the optimum.

Figure 13 compares a typical performance of XCS with biased exploration with a typical performance of XCST when both systems are applied to Woods14. The immediate impression is that XCST's performance is not very stable and is only near optimal. However, to fully understand Figure 13, we have to analyze how XCST learns. When in exploration, XCST continuously moves in the environment in order to visit all the niches frequently. Accordingly, the animat does not learn the optimal policy in the usual way, by "trajectories", i.e., starting in a position and exploring until a goal state is reached.

XCST's policy instead emerges from a set of experiences of a limited number of steps the animat has collected while it was learning in the environment. The system immediately learns an optimal policy for the the positions near the food cells, then it extends this policy during subsequent explorations in the other areas of the environment. We can think of the artificial animal, the animat, as a natural animal that first secures a good path to food and then extends its knowledge to other areas of the environment. In Maze6, the policy is extended very rapidly because the positions of the environment are near to the food position. In Woods14, the analysis of single runs shows that XCST almost immediately learns an optimal policy for the first eight positions; then the policy also converges for the subsequent eight positions. At the end, the performance is near optimal because for the last two positions of Woods14, the most difficult ones, the optimal policy is not completely determined.

The experiments with XCST in Woods14 highlight a limitation of teletransportation as an exploration strategy: since the environment is explored uniformly, the positions for which it is difficult to evolve an optimal solution requiring more experience converge slowly toward an optimal performance.

8 Exploration, Generalization, Models and Animats

Teletransportation is the heuristic we used to validate our hypothesis concerning generalization in XCS. From this perspective teletransportation should be considered a theoretical tool used in our experiments to support our hypothesis. Unfortunately, teletransportation cannot be applied to general problems, such as physical autonomous agents, because it would require the presence of a trainer that, every M[sub es] steps, picks up the agent and takes it to another area of the environment. We can, however, develop a technique from the teletransportation idea, feasible for general problems, through which a wider exploration of the environment can be guaranteed.

8.1 Related Work

As we pointed out previously, XCS usually learns a global policy, but it may tend to evolve local policies in those environments where the agent is not able to visit all the areas with the same frequency. This problem is not novel in the area of reinforcement learning. Many reinforcement learning algorithms, in order to converge to the optimum, require that the environment is visited uniformly. For example, when neural networks are employed, all the areas of the environment have to be explored with the same frequency, otherwise the neural network may overfit locally.

Характеристики

Тип файла

Документ

Размер

281 Kb

Материал

Несколько текстов для зачёта

Тип материала

Другое

Предмет

Английский язык

Высшее учебное заведение

МГТУ им. Н.Э.Баумана

Список файлов учебной работы

Поделитесь ссылкой:

Ставлю 10/10
Все нравится, очень удобный сайт, помогает в учебе. Кроме этого, можно заработать самому, выставляя готовые учебные материалы на продажу здесь. Рейтинги и отзывы на преподавателей очень помогают сориентироваться в начале нового семестра. Спасибо за такую функцию. Ставлю максимальную оценку.

Лучшая платформа для успешной сдачи сессии
Познакомился со СтудИзбой благодаря своему другу, очень нравится интерфейс, количество доступных файлов, цена, в общем, все прекрасно. Даже сам продаю какие-то свои работы.

Студизба ван лав ❤
Очень офигенный сайт для студентов. Много полезных учебных материалов. Пользуюсь студизбой с октября 2021 года. Серьёзных нареканий нет. Хотелось бы, что бы ввели подписочную модель и сделали материалы дешевле 300 рублей в рамках подписки бесплатными.

Отличный сайт
Лично меня всё устраивает - и покупка, и продажа; и цены, и возможность предпросмотра куска файла, и обилие бесплатных файлов (в подборках по авторам, читай, ВУЗам и факультетам). Есть определённые баги, но всё решаемо, да и администраторы реагируют в течение суток.

Маленький отзыв о большом помощнике!
Студизба спасает в те моменты, когда сроки горят, а работ накопилось достаточно. Довольно удобный сайт с простой навигацией и огромным количеством материалов.

Студ. Изба как крупнейший сборник работ для студентов
Тут дофига бывает всего полезного. Печально, что бывают предметы по которым даже одного бесплатного решения нет, но это скорее вопрос к студентам. В остальном всё здорово.

Спасательный островок
Если уже не успеваешь разобраться или застрял на каком-то задание поможет тебе быстро и недорого решить твою проблему.

Всё и так отлично
Всё очень удобно. Особенно круто, что есть система бонусов и можно выводить остатки денег. Очень много качественных бесплатных файлов.

Отзыв о системе "Студизба"
Отличная платформа для распространения работ, востребованных студентами. Хорошо налаженная и качественная работа сайта, огромная база заданий и аудитория.

Отличный помощник
Отличный сайт с кучей полезных файлов, позволяющий найти много методичек / учебников / отзывов о вузах и преподователях.

Отлично помогает студентам в любой момент для решения трудных и незамедлительных задач
Хотелось бы больше конкретной информации о преподавателях. А так в принципе хороший сайт, всегда им пользуюсь и ни разу не было желания прекратить. Хороший сайт для помощи студентам, удобный и приятный интерфейс. Из недостатков можно выделить только отсутствия небольшого количества файлов.

Спасибо за шикарный сайт
Великолепный сайт на котором студент за не большие деньги может найти помощь с дз, проектами курсовыми, лабораторными, а также узнать отзывы на преподавателей и бесплатно скачать пособия.