supercomp93 (1158320), страница 3

Файл №1158320 supercomp93 (Раздаточные материалы) 3 страницаsupercomp93 (1158320) страница 32019-09-182019-09-18СтудИзба

Раздаточные материалы

Просмтор этого файла доступен только зарегистрированным пользователям. Но у нас супер быстрая регистрация: достаточно только электронной почты!

Регистрация/авторизация

Текст из файла (страница 3)

The manager is static and every processor thread knows howto nd the manager. Elements are assigned managersby a simple cyclic distribution. The algorithm for implementing the function Get_Element(i) is given asfollows:1. Let p be the processor thread that is executingGet_Element(i) for an element i of some collection. Let P be the number of processors. Theindex m of the manager processor thread is givenby m = i mod P.

Thread p sends a message to mrequesting the identity of the owner.2. The manager q is \interrupted" and looks in atable for the owner o of element i and sends thisvalue to p.3. The requester p then sends a message to o askingfor element i.4. The owner o is \interrupted" and sends a copy ofelement i to the requester p.Hence, the primary implementation issues for agiven machine reduce to: How is a processor interrupted when a request foran element or owner is received? How is the table of owner identiers stored in themanager? How is the barrier operation implemented?The current pC++ compiler assumes no mechanism exists for interrupting a processor thread.

Instead, the compiler generates calls to a function calledPoll() in the element and collection methods. Bycalling Poll(), a thread can check a queue of incoming messages and reply to requests in a reasonablytimely manner. Unfortunately, calling Poll() periodically is not sucient to prevent starvation. If nointerruption mechanism exists on the target, it is necessary to make sure that the Barrier() function alsocalls Poll() while waiting for the barrier to complete.The nal issue to consider in the runtime environment is that of allocating the collection elements. Inthe current version a large table is created in eachprocessor object that stores pointers to the local collection.

A second table in each processor object storesthe index of the owner of each element that is managedby that processor. Because all of the elements andboth tables can be allocated and created based on thedistribution and alignment data, it is straightforwardto parallelize this task in the distributed memory environment. (This is in contrast to the shared memorysituation where synchronization is necessary to ensurethat each processor has access to pointers to all theelements.)3.1.1 The Thinking Machines CM-5The CM-5 is a distributed memory machine based ona fat tree network. Each processor is a Sparc CPUtogether with four vector units.

(In the experimentsdescribed here the vector units are not used.) The basic communication library is CMMD 3.0 Beta which isbased on a re-implementation of the Berkeley CMAMActive Message layer [5, 6]. The active message layerprovides very good support for short messages thatconsist of a pointer to a local function to execute andthree words of argument. In addition, the system canbe made to interrupt a processor upon the receipt of amessage, or it can be done by polling. One can switchbetween these two modes at run time. Our experienceindicates that using a combination of polling and interrupts works the best. During barrier synchronization,the interrupt mechanism is used.

The CMMD barrieroperation is very fast (4 microseconds).3.1.2 The Intel ParagonThe Intel Paragon is a distributed memory machinebased on a grid network. Each processor contains twoi860s. One i860 is used to run the user code and onehandles the message trac and talks to the specialmesh router. (Unfortunately, our testbed Paragon system is running \pre-release" software which only usesone of the i860s.) The basic communication library isthe NX system that has been used for many years onthe Intel machines.

NX only provides a very primitiveinterrupt driven message handler mechanism; consequently, only the polling strategy can be used. Furthermore, NX is not well optimized for very short messages, such as locating the owner of an element. Inaddition, implementing a barrier function that mustalso poll for messages is non-trivial and results in slowoperation. Barrier execution takes approximately 3milliseconds. However, the native NX barrier whichdoes not do polling is not much faster (about 2 milliseconds). Combined with the eect of pre-releasesoftware, the performance of the pC++ runtime system on the Intel Paragon is non-optimal.3.2 Shared Memory SystemsThere are three main implementation dierences inthe pC++ runtime system on a shared memory versusa distributed memory machine.

The most obvious difference is that message communication is not requiredfor accessing remote collection elements. All collection elements can be accessed using address pointersinto the shared memory space. A related dierence isthat collection element tables need only be allocatedonce, since all processors can directly reference tablesusing their base address. However, it may be benecial to have multiple copies of the tables to improvememory locality during Get_Element operations. Incontrast, it is necessary to have a separate collectionelement table on each processor node in a distributedmemory machine. The third dierence is in how collections are allocated.

In a distributed memory machine, the owner of elements of a collection allocatesthe space for those elements in the memory of the processor where it (the owner process) will execute. In ashared memory machine, the space for an entire collection is allocated out of shared memory space. Caremust be taken in memory allocation to minimize thecontention between local processor data (i.e., the data\owned" by a processor) and remote data. Achievinggood memory locality in a shared memory system, using processor cache or local memory, will be importantfor good performance.3.2.1 General StrategyThe current pC++ runtime system that we have implemented for shared memory machines has the following general, default properties: Collection element tables: Each processor has itsown copy of the element table for each collection.

Collection allocation: Each processor object allocates all the space for its local elements. Theprocessor objects then exchange local element addresses to build the full collection element table. Barrier synchronization: The barrier implementation is chosen from optimized hardware/software mechanisms on the target system.3.2.2 The BBN TC2000The BBN TC2000 [1] is a scalable multiprocessor architecture which can support up to 512 computationalnodes.

The nodes are interconnected by a variant ofa multistage cube network referred to as the butteryswitch. Each node contains a 20 MHz Motorola 88100microprocessor and memory which can be conguredfor local and shared access. The contribution of eachnode to the interleaved shared memory pool is set atboot time.The parallel processes are forked one at a time viathe nX system routine fork_and_bind. This routinecreates a child process via a UNIX fork mechanismand attaches the child to the specied processor node.The collection element tables and local collection elements are allocated in the local memory space on eachnode of the TC2000. There are several choices undernX for allocating collection elements in shared memory: across node memories (e.g., interleaved or random) or on a particular node's memory with dierentcaching policies (e.g., uncached or cached with copyback or write-through cache coherency).

Currently,the TC2000 pC++ runtime system allocates collectionelements in the \owner's" node memory with a writethrough caching strategy. The TC2000 does not havespecial barrier synchronization hardware. Instead,we implemented the logarithmic barrier algorithm described in [2]. Our implementation requires approximately 70 microseconds to synchronize 32 nodes. Thistime scales as the log of the number of processors.3.2.3 The Sequent SymmetryThe Sequent Symmetry [3] is a bus-based, sharedmemory multiprocessor that can be congured with upto 30 processors. The Symmetry architecture provideshardware cache consistency through a copy-back policy and user-accessible hardware locking mechanismsfor synchronization. For our prototype implementation, we used a Symmetry S81 machine with 24 processors (16 MHz Intel 80386 with a Weitek 1167 oatingpoint coprocessor) and 256 MBytes of memory acrossfour memory modules interleaved in 32 byte blocks.Using Sequent's parallel programming library, theimplementation of the pC++ runtime system wasstraightforward.

Because all memory in the Sequentmachine is physically shared in the hardware, the localallocation of the collection element tables on each processor is only meaningful relative to the virtual memory space of the process. All collection element tablesare allocated in the local data segment of each process,making them readable only by the process that createdthem. In contrast, collection elements must be allocated in a shared segment of the virtual address spaceof each process; a shared memory allocation routine isused for this purpose. Unfortunately, there is no wayto control the caching policy in software; copy-backis the hardware default. Barrier synchronization isimplemented using a system-supplied barrier routinewhich takes advantage of the hardware locking facilities of the Sequent machine. It is very ecient { thebarrier performance on 8, 12, 16, and 20 processors is34, 47, 58, and 70 microseconds, respectively.3.2.4 The Kendall Square Research KSR-1The KSR-1 is a shared virtual memory, massivelyparallel computer.

Характеристики

Тип файла

PDF-файл

Размер

221,58 Kb

Материал

Раздаточные материалы

Тип материала

Другое

Предмет

Модели параллельных вычислений и DVM технология разработки параллельных программ

Высшее учебное заведение

МГУ им. Ломоносова

Список файлов учебной работы

razdatochnye-materialy.rar

Раздаточные материалы

SAGE

stage-fin

2000_11_NT

doc

Installation guide (rus).doc

Installation guide.doc

User guide (rus).doc

User guide.doc

2000_11_UNIX

doc

Installation guide (rus).doc

Installation guide.doc

User guide (rus).doc

Полное содержание архива

Поделитесь ссылкой:

Ставлю 10/10
Все нравится, очень удобный сайт, помогает в учебе. Кроме этого, можно заработать самому, выставляя готовые учебные материалы на продажу здесь. Рейтинги и отзывы на преподавателей очень помогают сориентироваться в начале нового семестра. Спасибо за такую функцию. Ставлю максимальную оценку.

Лучшая платформа для успешной сдачи сессии
Познакомился со СтудИзбой благодаря своему другу, очень нравится интерфейс, количество доступных файлов, цена, в общем, все прекрасно. Даже сам продаю какие-то свои работы.

Студизба ван лав ❤
Очень офигенный сайт для студентов. Много полезных учебных материалов. Пользуюсь студизбой с октября 2021 года. Серьёзных нареканий нет. Хотелось бы, что бы ввели подписочную модель и сделали материалы дешевле 300 рублей в рамках подписки бесплатными.

Отличный сайт
Лично меня всё устраивает - и покупка, и продажа; и цены, и возможность предпросмотра куска файла, и обилие бесплатных файлов (в подборках по авторам, читай, ВУЗам и факультетам). Есть определённые баги, но всё решаемо, да и администраторы реагируют в течение суток.

Маленький отзыв о большом помощнике!
Студизба спасает в те моменты, когда сроки горят, а работ накопилось достаточно. Довольно удобный сайт с простой навигацией и огромным количеством материалов.

Студ. Изба как крупнейший сборник работ для студентов
Тут дофига бывает всего полезного. Печально, что бывают предметы по которым даже одного бесплатного решения нет, но это скорее вопрос к студентам. В остальном всё здорово.

Спасательный островок
Если уже не успеваешь разобраться или застрял на каком-то задание поможет тебе быстро и недорого решить твою проблему.

Всё и так отлично
Всё очень удобно. Особенно круто, что есть система бонусов и можно выводить остатки денег. Очень много качественных бесплатных файлов.

Отзыв о системе "Студизба"
Отличная платформа для распространения работ, востребованных студентами. Хорошо налаженная и качественная работа сайта, огромная база заданий и аудитория.

Отличный помощник
Отличный сайт с кучей полезных файлов, позволяющий найти много методичек / учебников / отзывов о вузах и преподователях.

Отлично помогает студентам в любой момент для решения трудных и незамедлительных задач
Хотелось бы больше конкретной информации о преподавателях. А так в принципе хороший сайт, всегда им пользуюсь и ни разу не было желания прекратить. Хороший сайт для помощи студентам, удобный и приятный интерфейс. Из недостатков можно выделить только отсутствия небольшого количества файлов.

Спасибо за шикарный сайт
Великолепный сайт на котором студент за не большие деньги может найти помощь с дз, проектами курсовыми, лабораторными, а также узнать отзывы на преподавателей и бесплатно скачать пособия.