Math (562419), страница 13

Файл №562419 Math (Несколько текстов для зачёта) 13 страницаMath (562419) страница 132015-12-04СтудИзба
Просмтор этого файла доступен только зарегистрированным пользователям. Но у нас супер быстрая регистрация: достаточно только электронной почты!

Текст из файла (страница 13)

Solutions to this kind of problem for reinforcement algorithms have already been proposed. Sutton (1990) introduced the Dyna architecture which integrates the learning algorithm with a model of the environment that is built up by experience. The model is then employed to simulate exploration in other areas of the environment or for planning. Another solution is the one proposed by Lin (1993) where the idea of experience replay is introduced: past experienced trajectories to goal states are memorized and subsequently used to recall past experiences in order to avoid local overfitting.

8.2 Dyna Architecture for XCS

Teletransportation may be implemented for real problems by integrating XCS with a model of the environment built during exploration. The model can be subsequently employed as in Sutton (1990) to simulate exploration in other areas of the environment, while the agent explores one specific environmental niche. The model may also be used for planning. The simplest way to develop a model of the environment in a discrete state/action space, like grid-worlds, is to memorize the past experience as quadruples of the form (s, a, s', r), where: s is current sensory input; a is the action the agent selected when it perceived s; s' is the sensory input returned when the agent perceiving s has performed a; finally, r is the immediate reward the agent received for performing a when in s. This type of model, similar to Riolo's (1991) work on latent learning, is easily integrated into XCS. The overall system, which we call Dyna-XCS, works as follows.

When in exploration, the animat is placed randomly in a blank cell of the environment and then it moves under the control of XCS using one of the usual exploration strategies, i.e., random or biased. If the animat reaches a food cell by M[sub es] steps, the exploration ends. Otherwise, if the animat does not find food "in time" (in M[sub es] steps) the system stops exploring the environment and starts using the model of the environment in order to simulate an exploration experiment on the model. Accordingly, the current sensor configuration is memorized, and a new exploration starts in the model. Exploration within the model is very similar to the exploration the agent performs in the environment. First, the initial position is determined randomly among the states which appear in the first position of the quadruples that have been experienced. Then exploration continues on the model until S[sub es] steps have been performed, or the animat has reached a food cell in the model. At this point, the animat ends the simulated exploration in the model and restarts the exploration in the environment at the same position in which the exploration in the environment stopped.

8.3 Discussion

The Dyna-XCS system we implemented is still under experimentation. However, the initial results show that there is almost no difference in performance between XCST and Dyna-XCS using the three environments employed in this paper. We expected this result. As observed previously, these environments are quite small. Therefore, after the system has solved a few hundred problems, it has also tried almost every condition/action pair and has an almost a complete description of the environment. Accordingly, the exploration with the model becomes almost identical to the exploration in the environment.

These initial results highlight the major problem with the implementation based on experience quadruples: the memory required to store the model by quadruples dramatically grows as the exploration in the environment proceeds because a complete description of the environment is likely to be produced. Therefore, this solution is only feasible in small environments. More complex environments instead require algorithms to produce a compact representation of the model.

A possible solution may consist of introducing a hybrid architecture in which XCS is used for learning, and a different type of algorithm used for building the environment model, for instance, a neural network. However, this type of solution would introduce elements which are not related to the philosophy underlying XCS. A more elegant solution can be suggested.

XCS is a learning algorithm which may be used for learning the environment model. We propose a solution in which the XCS system that has to learn to reach food in the environment is coupled with an XCS system that is employed to learn the environment model. The second system should have classifiers whose: (1) conditions represent a state/action pair (8, a); (2) actions represent the prediction of the next sensory state (s') and the immediate reward (r) XCS expects to gain when getting to s'. This version of XCS learns a predictive model of the environment, an extension already proposed by Wilson (1995) in the original XCS paper. Recently, Stolzmann (1997) introduced an Anticipatory Classifier System that is designed to learn an environmental model.

9 Evolving a Compact Representation

Previously, we discussed how the generalization mechanism of XCS and the structure of the environment influence system performance. Another important aspect of generalization in XCS concerns the capability of XCS to evolve a compact representation of the learned task.

9.1 Generalization and Task Representation in XCS

Results reported in the literature show that XCS can evolve near minimal populations of accurate and maximally general classifiers (Wilson, 1997a). Recently, Kovacs (1997) proposed an optimality hypothesis which states that XCS tends to evolve the minimal population with respect to the Boolean multiplexer function.

With respect to animat problems, we now discuss how XCS develops a tendency to evolve near minimal populations. We show how, in certain environments, XCS may fail to evolve a compact representation and may produce redundant representations of certain tasks.

Consider again Woods14 (Figure 6). Every position in Woods14 is uniquely determined by the position of the two adjacent free cells. Therefore, in each classifier condition only two bits are sufficient to characterize a specific environmental niche. Since classifiers in Woods14 are 16 bits long, for each niche there are 2[sup 14] possible classifiers belonging to that niche only. According to Wilson's hypothesis, general classifiers should reproduce more than specific ones since general classifiers appear in more match sets. Unfortunately, the last statement is not always true.

For example, consider the two conditions 1010001010001010 and ####0#####0#####. Although the second condition has many more don't care symbols, both conditions match only the third free position in Wood14. We can say that the latter condition is formally more general than the former condition because it has more # symbols. However, the latter condition is not concretely more general than the former because it matches the same number of niches. Note that in XCS the pressure toward more general classifiers is effective only if the generality of the classifiers is concretely exploited in the environment (i.e., general classifiers match more niches). Accordingly, in environments that offer few chances of building concrete generalizations, like Woods14, the pressure toward concretely more general classifiers is lost because Wilson's generalization hypothesis does not apply.(n5) In such situations XCS rapidly evolves a set of classifiers that exploit the maximum generalization offered by the environmental states. Then, by recombination and mutation of these classifiers, the system can start producing classifiers that are formally more general (contain more # symbols) but that, in practice, do not match more niches (they are not concretely more general). As a consequence the representation of the task can become redundant.

As an example, we apply XCST to Maze6 with a population of 1600 classifiers. Figure 14 reports the number of macroclassifiers in the population, and the curve is averaged over ten runs. Notice that the number of macroclassifiers grows immediately and then reaches an equilibrium value which depends on the genetic pressure. The analysis of final populations shows that only a few of the macroclassifiers represent more than one microclassifier.

9.2 The Role of Subsumption Deletion

We now address how the representation of the task that the agent is learning can be compacted in environments, such as Maze6, where a pressure toward more general classifiers cannot be developed.

Subsumption deletion was introduced by Wilson (1997) to improve generalization with XCS. However, early experiments with Maze5, Maze6 and Woods 14 (not reported here) show that its introduction may decrease system performance and, in some cases, prevent the system from converging to a stable policy.

Next, we analyze an important aspect of subsumption deletion in order to explain why it may compromise XCS's performance and how the observed behavior is related to specify and teletransportation.

Subsumption deletion acts when new classifiers created by the GA must be inserted in the population and replaces offspring classifiers with clones of their parents if: (1) the offspring classifiers are more specific than their parents, and (2) the parameters of their parents have been updated sufficiently. As has been observed (Lanzi, 1997a), XCS with subsumption deletion evolves formal generalizations: #s are not inserted in the classifiers because they are necessary in order to match more niches; instead, the system converges to a population in which classifiers contain as many # symbols as possible without becoming inaccurate. XCS with subsumption deletion tends to produce classifiers which apply to many more conditions than those the agent experienced. They are also likely to be inaccurate if the environment is extended -for example, if a new area of the environment is discovered.

In such cases the specify operator can be useful in recovering from classifiers that are overly general in the new area of the environment. In fact, all the classifiers that are overly general in the new area will become inaccurate and specify will be activated.

This aspect of subsumption deletion is strictly related to the accuracy parameter and therefore to teletransportation. In fact, subsumption deletion relies upon the accuracy parameter and thus it may corrupt the population when overly general classifiers are evaluated as accurate. We developed a series of experiments in which XCS with biased exploration was applied to the environments previously presented. Results (not reported here) show that the performance of the system is highly decreased when subsumption deletion is used.

After we introduced teletransportation, we repeated the set of experiments to test whether the decrease of performance when subsumption was used depended on the presence of overly general classifiers that were evaluated as accurate.

As Figure 15 shows, XCST's performance is still optimal when subsumption deletion is employed. The comparison of the population size in macroclassifiers for the two systems in Figure 16 shows that it compacts the representation producing a smaller population. These results suggest that the decrease in XCS's performance when XCS employs subsumption deletion is related to the problem of local learning that was introduced in the previous sections.

9.3 Discussion

It is worth noting that the aspects of subsumption deletion we've just discussed are closely related to the phenomena discussed in the first part of the paper. Discovering a new area of the environment in which the evolved classifiers are overly general is similar to the case in which an animat learns locally because it cannot explore all the areas of the environment frequently. Therefore, we might observe that specify and teletransportation are complementary. However, the two problems are quite distinct in that we may have an environment where every area can be visited frequently in which, suddenly, a door opens and the animat faces an unexplored area. Most important, teletransportation acts through evolution and therefore it is slower than specify, which is a heuristic and thus acts much faster than genetic evolution.

10 Summary

We presented a study of the generalization mechanism of XCS to explain some of the previously reported results. We analyzed Wilson's generalization hypothesis which explains how generalization in XCS works. Then, we stated a hypothesis which suggests that XCS may not converge to the optimum when, due to the structure of the environment and to the exploration strategy, the system is not able to visit all the areas of the environment frequently. We verified our hypothesis by introducing a meta-exploration strategy, teletransportation, which was used as a theoretical tool during the validation phase. Subsequently, we suggested how the ideas underlying teletransportation might be implemented in a real application integrating XCS with a model of the environment in a Dyna architecture. A possible implementation of such an architecture was then discussed.

Finally, we analyzed the conditions under which XCS may fail to evolve a compact representation of the learned task. We showed this is likely to happen in environments where there is no direct relation between the number of don't care symbols a classifier condition has and the number of environmental configurations the condition matches. In such cases, there is no pressure for further generalization beyond what may be permitted by available states. The system rapidly reaches the maximum generalization that the environmental states permit. Then it can start evolving a redundant representation of the learned task. Accordingly, we showed how subsumption deletion can be effective in evolving a more compact solution.

Acknowledgments

I wish to thank all the people who helped me during the course of this work. Stewart W. Wilson, who was always available for discussing the generalization issue and for reviewing the early versions of this paper. Marco Dorigo, for the many discussions on learning classifier systems when I was in Brussels. Marco Colombetti, who supports my work and is always available for discussions. Trevor Collins, for his invaluable effort in trying to improve my English. Finally, Gabriella, for her never ending patience.

This research was partially funded by a grant to Marco Colombetti for the year 1996 from the Fondo di Ricerca d'Ateneo (University Research Fund) of the Politecnico di Milano.

(n1) General parameters for the four algorithms are set as follows: Beta=0.2, Gamma=0.71, Theta=25, Epsilon[sub 0]=0.01, Alpha=0.1, Chi=0.8, Mu=0.01, Delta=0.1, Phi=0.5, P[sub #]=0.3, P[sub I]=10.0, Epsilon[sub I]=0.0, F[sub I]=10.0.

Specific parameters are set as follows: for algorithm (ii), P[sub #]=0.0 and mutation does not insert any # symbol; in (iii) specify parameters are set as N[sub Sp]=20 and P[sub Sp]=0.5; finally, biased exploration chooses an action randomly with a probability P[sub s]=0.3. We refer the interested reader to Wilson (1995) for a detailed description of each of the parameters.

(n2) Due to the significant difference of the results it is not possible to use the same scale on both plots.

(n3) XCS's parameters are set as in the previous experiment, except for the specify probability parameter, P[sub Sp], which is set to 0.8.

(n4) This happens because the classifier parameters are always updated according to the payoff level of the first niche, but never according to the payoff level of the second niche.

(n5) This phenomenon was already noticed by Wilson (1995) where, discussing the generalization produced by XCS in Woods2, Wilson observed that XCS produced classifiers which matched the same niches but contained different numbers of # symbols.

Характеристики

Тип файла
Документ
Размер
281 Kb
Тип материала
Высшее учебное заведение

Список файлов учебной работы

Свежие статьи
Популярно сейчас
Зачем заказывать выполнение своего задания, если оно уже было выполнено много много раз? Его можно просто купить или даже скачать бесплатно на СтудИзбе. Найдите нужный учебный материал у нас!
Ответы на популярные вопросы
Да! Наши авторы собирают и выкладывают те работы, которые сдаются в Вашем учебном заведении ежегодно и уже проверены преподавателями.
Да! У нас любой человек может выложить любую учебную работу и зарабатывать на её продажах! Но каждый учебный материал публикуется только после тщательной проверки администрацией.
Вернём деньги! А если быть более точными, то автору даётся немного времени на исправление, а если не исправит или выйдет время, то вернём деньги в полном объёме!
Да! На равне с готовыми студенческими работами у нас продаются услуги. Цены на услуги видны сразу, то есть Вам нужно только указать параметры и сразу можно оплачивать.
Отзывы студентов
Ставлю 10/10
Все нравится, очень удобный сайт, помогает в учебе. Кроме этого, можно заработать самому, выставляя готовые учебные материалы на продажу здесь. Рейтинги и отзывы на преподавателей очень помогают сориентироваться в начале нового семестра. Спасибо за такую функцию. Ставлю максимальную оценку.
Лучшая платформа для успешной сдачи сессии
Познакомился со СтудИзбой благодаря своему другу, очень нравится интерфейс, количество доступных файлов, цена, в общем, все прекрасно. Даже сам продаю какие-то свои работы.
Студизба ван лав ❤
Очень офигенный сайт для студентов. Много полезных учебных материалов. Пользуюсь студизбой с октября 2021 года. Серьёзных нареканий нет. Хотелось бы, что бы ввели подписочную модель и сделали материалы дешевле 300 рублей в рамках подписки бесплатными.
Отличный сайт
Лично меня всё устраивает - и покупка, и продажа; и цены, и возможность предпросмотра куска файла, и обилие бесплатных файлов (в подборках по авторам, читай, ВУЗам и факультетам). Есть определённые баги, но всё решаемо, да и администраторы реагируют в течение суток.
Маленький отзыв о большом помощнике!
Студизба спасает в те моменты, когда сроки горят, а работ накопилось достаточно. Довольно удобный сайт с простой навигацией и огромным количеством материалов.
Студ. Изба как крупнейший сборник работ для студентов
Тут дофига бывает всего полезного. Печально, что бывают предметы по которым даже одного бесплатного решения нет, но это скорее вопрос к студентам. В остальном всё здорово.
Спасательный островок
Если уже не успеваешь разобраться или застрял на каком-то задание поможет тебе быстро и недорого решить твою проблему.
Всё и так отлично
Всё очень удобно. Особенно круто, что есть система бонусов и можно выводить остатки денег. Очень много качественных бесплатных файлов.
Отзыв о системе "Студизба"
Отличная платформа для распространения работ, востребованных студентами. Хорошо налаженная и качественная работа сайта, огромная база заданий и аудитория.
Отличный помощник
Отличный сайт с кучей полезных файлов, позволяющий найти много методичек / учебников / отзывов о вузах и преподователях.
Отлично помогает студентам в любой момент для решения трудных и незамедлительных задач
Хотелось бы больше конкретной информации о преподавателях. А так в принципе хороший сайт, всегда им пользуюсь и ни разу не было желания прекратить. Хороший сайт для помощи студентам, удобный и приятный интерфейс. Из недостатков можно выделить только отсутствия небольшого количества файлов.
Спасибо за шикарный сайт
Великолепный сайт на котором студент за не большие деньги может найти помощь с дз, проектами курсовыми, лабораторными, а также узнать отзывы на преподавателей и бесплатно скачать пособия.
Популярные преподаватели
Добавляйте материалы
и зарабатывайте!
Продажи идут автоматически
6486
Авторов
на СтудИзбе
303
Средний доход
с одного платного файла
Обучение Подробнее