rtsIDe (1158448)
Текст из файла
120
Keldysh Institute of Applied Mathematics
Russia Academy of Sciences
Lib-DVM library
Interface description
March 2000
CONTENTS
1 Introduction 6
2 Run-Time System initialization and completion 6
3 Creating abstract machine representations 7
3.1 Requesting current abstract machine 8
3.2 Creating abstract machine representation 8
3.3 Requesting reference to an element of abstract machine representation 9
3.4 Deleting abstract machine representation 9
4 Processor systems 9
4.1 Requesting reference to the processor system 9
4.2 Creating subsystem of specified processor system 10
4.3 Reconfiguring (changing shape of) processor system 11
4.4 Deleting processor system 11
4.5 Weights of processor system elements 11
4.6 Specifying coordinate weights of processors using their loading weights 16
4.7 Виртуальные многопроцессорные системы (многопроцессорные системы программы пользователя) 18
5 Mapping abstract machine 19
5.1 Mapping abstract machine representation onto processor system (resource distribution) 19
5.2 Remapping abstract machine representation onto processor system (resource redistribution) 23
5.3 Requesting map 24
5.4 Specifying abstract machine representation mapping according to map 25
5.5 Remapping abstract machine representation according to the map 25
5.6 Deleting map 26
5.7 Imbalanced block distribution 27
6 Distributed array creating and deleting 28
6.1 Creating distributed array 28
6.2 Deleting distributed array 29
6.3 Creating additional header of distributed array 29
6.4 Deleting distributed array header 30
7 Mapping distributed array 30
7.1 Aligning distributed array 30
7.2 Alignments superposition 33
7.3 Realigning distributed array 35
7.4 Requesting map 35
7.5 Specifying distributed array mapping according to map 36
7.6 Realigning distributed array according to map 36
7.7 Deleting map 37
7.8 Requesting reference to abstract machine representation, specified distributed array is mapped on 38
8 Program block definition 38
8.1 Block beginning 38
8.2 Block end 38
9 Parallel loop defining 39
9.1 Creating parallel loop 39
9.2 Mapping parallel loop 39
9.3 Reordering parallel loop execution 43
9.4 Inquiry of continuation of parallel loop execution 45
9.5 Terminating parallel loop 46
9.6 Specifying information about data dependence between parallel loop iterations 46
10 Representation of the program as a set of subtasks executed in parallel 47
10.1 Mapping abstract machine (subtask creation) 48
10.2 Starting subtask (activation) 48
10.3 Completing (stopping) current subtask 48
11 Reduction 49
11.1 Creating reduction variable 49
11.2 Задание типа индексных переменных, значения которых определяют координаты локального максимума или минимума редукционной переменной 51
11.3 Creating reduction group 52
11.4 Including reduction in reduction group 52
11.5 Storing values of reduction variables 54
11.6 Starting reduction group 54
11.7 Waiting for completion of reduction group 55
11.8 Deleting reduction group 55
11.9 Deleting reduction 56
11.10 Поддержка асинхронной редукции при выполнении параллельного цикла 56
12 Renewing shadow edges of distributed array 57
12.1 Creating shadow edge group 59
12.2 Including shadow edge in the group 60
12.3 Starting shadow edge group renewing 63
12.4 Initializating receiving imported elements of specified shadow edge group 63
12.5 Initializing sending exported elements of specified shadow edges group 63
12.6 Waiting for completion of shadow edge group renewing 64
12.7 Deleting shadow edge group 64
13 Access to distributed array elements 65
13.1 Coping distributed array element 65
13.1.1 Reading distributed array element and assigning value to element 65
13.1.2 Copying one element of distributed array to another 66
13.1.3 Unified coping of element of distributed array 66
13.2 Coping distributed arrays 67
13.3 Asynchronous coping distributed arrays 68
13.4 Access to elements of local part of distributed array 69
13.4.1 Requesting if array element is allocated in local part of distributed array 71
13.4.2 Requesting initial and last index values of local part of distributed array 71
13.4.3 Reading element of local part of distributed array 72
13.4.4 Assigning value to element of local part of distributed array 72
13.4.5 Coping element of local part of distributed array to element of local part of other distributed array 72
13.4.6 Requesting address of element of local part of distributed array 73
13.5 Macros to access elements of local part of distributed array of rank from1 to 7 73
13.6 Sequential requesting index values of distributed array elements. 74
14 Regular access to remote data 74
14.1 Creating remote element buffer of distributed array 75
14.2 Initializing loading remote element buffer of distributed array 79
14.3 Waiting for completion of loading remote element buffer of distributed array 79
14.4 Deleting remote element buffer of distributed array 80
14.5 Access to distributed array elements, allocated in remote element buffer 80
14.6 Creating group of remote element buffers 81
14.7 Including remote element buffer in the group 82
14.8 Starting loading remote element buffers of specified group 82
14.9 Waiting for completion of loading remote element buffers of specified group 82
14.10 Deleting group of remote element buffers 82
14.11 Requesting type of distributed array element access 83
15 Non-regular access to remote data 84
15.1 Creating remote element buffer of non-regular access 85
15.2 Starting loading remote element buffer of non-regular access 86
15.3 Waiting for completion of loading remote element buffer of non-regular access 86
15.4 Deleting råmote element buffer of non-òegular access 86
15.5 Access to remote elements, allocated in the buffer 87
15.6 Creating group of remote element buffers of non-regular access 88
15.7 Including remote element buffer of non-regular access in the group 88
15.8 Starting loading remote element buffers of specified group 88
15.9 Waiting for completion of loading remote element buffers of specified group 88
15.10 Deleting group of remote element buffers of non-regular access 88
16 Input/Output 89
16.1 Analogies to functions of C language standard library 89
16.1.1 High level input/output functions 89
16.1.2 Low lewel I/O functions 91
16.1.3 Operations with directories and files 92
16.2 Reading from file to sub-array of distributed array 92
16.3 Writing sub-array of distributed array to file 94
17 Miscellaneous functions 96
17.1 Requesting size of object 96
17.2 Requesting size of object dimension 96
17.3 Requesting if object is distributed array 97
17.4 Requesting size of distributed array element 97
17.5 Deleting object 98
17.6 Requesting whether current processor is I/O processor 98
17.7 Sending memory areas of I/O processor 98
18 Using Run-Time System in Fortran language 99
19 Examples of programs using Run-Time System functions 102
19.1 Solution of Laplas equation by Jacobi method 102
19.2 Parallel loop with regular data dependence between iterations 111
1Introduction
Before proceeding with DVM Run-Time Library functions let us give a short description of the parallel computations model. A parallel C-DVM (or Fortran DVM) program is translated to the program in the standard C (or Fortran 77) language extended by calls of the Run-Time Library functions, and to be executed according to SPMD model on each processor assigned to the task.
On startup the program has the only branch (control flow). This branch is executed from the first program statement on all the processors of the processor system.
Let us define the processor system (or system of the processors) as computing machine, assigned to the user program by hardware and by the base system software. For example, for computers with distributed memory the computing machine can be an MPI-machine. In this case, the processor system is a group of MPI-processes, created when the program is started. The number of the processors of processor system, as well as its representation as a multidimensional grid is specified in the command line starting the program. All declared variables are replicated over all the processors. The only exception is arrays specially defined as "distributed".
Entering a parallel loop, the branch is split into some number of parallel branches. Each of the branches is executed on a separate processor of the processor system.
Leaving a parallel construct, all parallel branches are merged into the original branch, which was executed before entering the parallel construct. At this moment all changes in replicated variables caused by the parallel branches execution become visible to all processors (that is, the variables are set to coherent state).
2Run-Time System initialization and completion
Initialization in C program:
| .long rtl_init ( | .long .int | InitParam, .árgc, *argv[] ); |
Initialization in Fortran program:
long linit_ (long *InitParamPtr);
| InitParam or | |
|
| argc | | number of string parameters in command line. |
| argv | | array containing pointers to string parameters in command. |
The functions rtl_init and linit_ initializes Run-Time System internal structures according to modes of interprocessor exchanges, statistic and trace accumulation, and so on, defined in configuration files.
The initialization parameter can be:
| 0 | | default initialization; |
| 1 | | initialization with blocked dynamic control (in this case dynamic control specified in Run-Time System startup parameters is suppressed). |
The function returns zero.
long lexit_ (long *UserResPtr);
*UserResPtr value returned by user program.
The function lexit_ completes correctly the execution of Run-Time System. That is, the function frees the memory used by Run-Time System, writes the statistic and trace information into disk file, and so on.
The function does not return control.
Note. A user program startup on processor system requires to specify (as startup parameters) the following characteristics of the processor system as multidimensional array: the processor system rank and sizes of all its dimensions.
Let the rank of processor system be n, and size of k-th dimension be PSSizek (1 k n). Then when Run-Time System is initializes an internal number ProcNumberint will be assigned to the each processor
| ProcNumberint(I1, ... ,In) = | n | n |
where:
| Ik | | processor index value of k-th dimension of the processor system index space (0 Ik PSSizek 1). |
So the internal number is the linear index of the processor in index space of the procåssor system.
In interprocessor exchanges a processor identifier ProcIdent is used as the processor adsress. The correspondence
Характеристики
Тип файла документ
Документы такого типа открываются такими программами, как Microsoft Office Word на компьютерах Windows, Apple Pages на компьютерах Mac, Open Office - бесплатная альтернатива на различных платформах, в том числе Linux. Наиболее простым и современным решением будут Google документы, так как открываются онлайн без скачивания прямо в браузере на любой платформе. Существуют российские качественные аналоги, например от Яндекса.
Будьте внимательны на мобильных устройствах, так как там используются упрощённый функционал даже в официальном приложении от Microsoft, поэтому для просмотра скачивайте PDF-версию. А если нужно редактировать файл, то используйте оригинальный файл.
Файлы такого типа обычно разбиты на страницы, а текст может быть форматированным (жирный, курсив, выбор шрифта, таблицы и т.п.), а также в него можно добавлять изображения. Формат идеально подходит для рефератов, докладов и РПЗ курсовых проектов, которые необходимо распечатать. Кстати перед печатью также сохраняйте файл в PDF, так как принтер может начудить со шрифтами.















