LIBDVM2 (1158351)
Текст из файла
84
LIB-DVM
Interface Description
March 27, 1999
Version 2.0
Keldysh Institute of Applied Mathematics
Russia Academy of Sciences
Contents
1. Introduction. 6
2. Run-Time Library initialization and completion. 6
3. Creating abstract machine. 8
3.1. Requesting current abstract machine. 8
3.2. Creating abstract machine representation. 8
3.3. Requesting pointer to an element of abstract machine representation. 9
3.4. Deleting abstract machine representation. 9
4. Processor systems. 9
4.1. Requesting pointer to the processor system. 9
4.2. Creating subsystem of specified processor system. 10
4.3. Reconfiguring (changing shape of) processor system. 10
4.4. Deleting processor system. 11
4.5. Weights of processor system elements. 11
4.6. Setting coordinate weights of processor system elements according to specified computational loading. 14
5. Mapping abstract machine. 15
5.1. Mapping abstract machine representation onto processor system (resource distribution). 15
5.2. Remapping abstract machine representation onto processor system (resource redistribution) 19
5.3. Requesting map. 20
5.4. Specifying abstract machine representation mapping according to map. 20
5.5. Remapping abstract machine representation according to the map. 21
5.6. Deleting map. 21
5.7. Imbalanced block distribution. 22
6. Distributed array creating and deleting. 23
6.1. Creating distributed array. 23
6.2. Deleting distributed array. 24
6.3. Creating additional header of distributed array. 24
6.4. Deleting distributed array header. 25
7. Mapping distributed array. 25
7.1. Aligning distributed array. 25
7.2. Alignments superposition. 28
7.3. Realigning distributed array. 29
7.4. Requesting map 30
7.5. Specifying distributed array mapping according to map. 30
7.6. Realigning distributed array according to map. 31
7.7. Deleting map. 31
8. PROGRAM BLOCK DEFINITION. 32
8.1. Block beginning 32
8.2. Block end 32
9. Parallel loop defining. 33
9.1. Creating parallel loop. 33
9.2. Mapping parallel loop. 33
9.3. Reordering parallel loop execution. 36
9.4. Inquiry of continuation of parallel loop execution. 37
9.5. Terminating parallel loop. 38
10. Representation of the program as a set of executed in parallel subtasks. 38
10.1. Mapping abstract machine (subtask creation). 38
10.2. Starting subtask (activation). 39
10.3. Completing (stopping) current subtask. 39
11. Reduction. 39
11.1. Creating reduction variable. 39
11.2. Creating reduction group. 41
11.3. Including reduction in reduction group. 41
11.4. Storing values of reduction variables. 42
11.5. Starting reduction group. 42
11.6. Waiting for completion of reduction group. 42
11.7. Deleting reduction group. 43
11.8. Deleting reduction. 43
12. Renewing shadow edges of distributed array. 43
12.1. Creating shadow edge group. 45
12.2. Including shadow edge in the group. 45
12.3. Starting shadow edge group renewing. 47
12.4. Initializating receiving imported elements of specified shadow edge group. 48
12.5. Initializing sending exported elements of specified shadow edges group. 48
12.6. Waiting for completion of shadow edge group renewing. 48
12.7. Deleting shadow edge group. 48
13. Access to distributed array elements. 49
13.1. Coping distributed array element. 49
13.1.1. Reading distributed array element and assigning value to element. 49
13.1.2. Copying one element of distributed array to another. 50
13.1.3. Unified coping of element of distributed array. 50
13.2. Coping distributed arrays. 51
13.3. Asynchronous coping distributed arrays. 52
13.4.1. Requesting if array element is allocated in local part of distributed array. 54
13.4.2. Requesting initial and last index values of local part of distributed array. 55
13.4.3. Reading element of local part of distributed array. 55
13.4.4. Assigning value to element of local part of distributed array. 55
13.4.5. Coping element of local part of distributed array to element of local part of other distributed array. 56
13.4.6. Requesting address of element of local part of distributed array. 56
13.5. Macros to access elements of local part of distributed array of rank from1 to 7. 56
14. Regular access to remote data. 57
14.1 Creating remote element buffer of distributed array. 57
14.2. Initializing loading remote element buffer of distributed array. 60
14.3. Waiting for completion of loading remote element buffer of distributed array. 60
14.4. Deleting remote element buffer of distributed array. 60
14.5. Access to distributed array elements, allocated in remote element buffer. 60
14.6. Creating group of remote element buffers. 62
14.7. Including remote element buffer in the group. 62
14.8. Starting loading remote element buffers of specified group. 62
14.9. Waiting for completion of loading remote element buffers of specified group. 62
14.10. Deleting group of remote element buffers. 62
15. Non-regular access to remote data. 63
15.1. Creating remote element buffer of non-regular access. 64
15.2. Starting loading remote element buffer of non-regular access. 64
15.3. Waiting for completion of loading remote element buffer of non-regular access. 65
15.4. Deleting remote element buffer of non-regular access. 65
15.5. Access to remote elements, allocated in the buffer. 65
15.6. Creating group of remote element buffers of non-regular access. 66
15.7. Including remote element buffer of non-regular access in the group. 66
15.8. Starting loading remote element buffers of specified group. 67
15.9. Waiting for completion of loading remote element buffers of specified group. 67
15.10. Deleting group of remote element buffers of non-regular access. 67
16. Input/Output. 67
16.1. Analogies to functions of C language standard library. 67
16.1.1. High level input/output functions. 67
16.1.2. Low lewel I/O functions. 69
16.1.3. Operations with directories and files. 70
16.2. Reading from file to sub-array of distributed array. 70
16.3. Writing sub-array of distributed array to file. 72
17. Miscellaneous functions. 74
17.1. Requesting size of object. 74
17.2. Requesting size of object dimension. 74
17.3. Requesting if object is distributed array. 75
17.4. Requesting size of distributed array element. 75
17.5. Deleting object. 75
17.6. Requesting whether current processor is I/O processor. 76
17.7. Sending memory areas of I/O processor. 76
18. Using Run-Time Library in Fortran language. 77
19. Example of program using Run-Time Library functions. 77
1. Introduction.
Before proceeding with DVM Run-Time library functions let us give a short description of the parallel computations model.
A parallel C-DVM (or Fortran DVM) program is translated to the program in the standard C (or Fortran 77) language extended by calls of the Run-Time Library functions, and to be executed according to SPMD model on each processor assigned to the task.
On startup the program has the only branch (control flow). This branch is executed from the first program statement on all the processors of the processor system.
Let us define the processor system (or system of the processors) as computing machine, assigned to the user program by hardware and by the base system software. For example, for computers with distributed memory the computing machine can be an MPI-machine. In this case, the processor system is a group of MPI-processes, created when the program is started. The number of the processors of processor system, as well as its representation as a multidimensional grid is specified in the command line starting the program. All declared variables are replicated over all the processors. The only exception is arrays specially defined as «distributed».
Entering a parallel loop, the branch is split into some number of parallel branches. Each of the branches is executed on a separate processor of the processor system.
Leaving a parallel construct, all parallel branches are merged into the original branch, which was executed before entering the parallel construct. At this moment all changes in replicated variables caused by the parallel branches execution become visible to all processors (that is, the variables are set to coherent state).
2. Run-Time Library initialization and completion.
Initialization in C program:
| long rtl_init ( | long | InitParam, |
Initialization in Fortran program:
long linit_ (long *InitParamPtr);
| InitParam or | |
| *InitParamPtr | - parameter of Run-Time Library initialization. |
| argc | - number of string parameters in command line. |
| argv | - array containing pointers to string parameters in command |
The functions rtl_init and linit_ initializes Run-Time Library internal structures according to modes of interprocessor exchanges, statistic and trace accumulation, and so on, defined in configuration files.
The initialization parameter can be:
| 0 | - default initialization; |
| 1 | - initialization with blocked dynamic control (in this case dynamic control specified in Run-Time Library startup parameters is suppressed). |
The function returns zero.
long lexit_ (long *UserResPtr);
*UserResPtr - value returned by user program.
The function lexit_ completes correctly the execution of Run-Time Library. That is, the function frees the memory used by Run-Time Library, writes the statistic and trace information into disk file, and so on.
The function does not return control.
Note. A user program startup on processor system requires to specify (as startup parameters) the following characteristics of the processor system as multidimensional array: the processor system rank and sizes of all its dimensions.
Let the rank of processor system be n, and size of k-th dimension be PSSizek (1 k n). Then when Run-Time Library is initializes an internal number ProcNumberint will be assigned to the each processor
| | n | n |
where
| Ik | - processor index value of k-th dimension of the processor system index space (0 Ik PSSizek-1). |
So the internal number is the linear index of the processor in index space of the processor system.
In interprocessor exchanges a processor identifier ProcIdent is used as the processor adsress. The correspondence
ProcNumberint ProcIdent
is defined by Message Passing System and returned to Run-Time Library when it is initialized.
There are two functionally special processors: input/output processor and central processor among processors, assigned to a task. Input/output processor is intended to deal with the file system directly (see section 16) and its internal number is zero usually.
The central processor computes the reduction functions (see section 11) and is usually defined by an index vector ([PSSize1/2], ... ,[PSSizen/2]).
The internal numbers of central processor and input/output processor can be specified as startup parameters (non-standard internal numbers).
3. Creating abstract machine.
An abstract machine concept is introduced for two-step mapping of a parallel program onto a real parallel computer. First, a programmer creates an abstract machine, most suitable for his program (that is, the abstract machine realizing all potential program parallelism). Then, the programmer defines the mapping of his computations and data onto this machine, and he also defines the rules of mapping this abstract machine onto a real parallel computer. Therefore, an abstract machine is a hierarchy of abstract parallel subsystems. Each of these subsystems can be represented as a multidimensional array of subsystems of the next hierarchy level. Several different representations for each subsystem may co-exist.
In the first versions of the C-DVM and Fortran-DVM languages functional parallelism description is not supported, so the term «abstract machine» is not used. Instead of this the term «template» («TEMPLATE») is used. Each «template», described in the program, is represented as an abstract machine in Run-Time Library. For each explicitly distributed array (that is array, specified with DVM-directive «DISTRIBUTE») a corresponding abstract machine is created too.
3.1. Requesting current abstract machine.
AMRef getam_(void);
This function returns a pointer to current abstract machine. The current abstract machine is an abstract machine the current program branch is mapped on. Only one abstract machine (the top level of hierarchy) exists when the program starts. The initial abstract machine is mapped onto the processor system assigned by Operating System (OS) for program execution. Therefore, all processors concerned execute initial program branch (mapped onto initial abstract machine). All abstract machines, which program creates later, are descendants of the initial abstract machine. An abstract machine becomes the current one when control enters parallel branch (parallel loop iteration) mapped onto this abstract machine or when control exits from the parallel construct mapped onto some representation of this abstract machine.
3.2. Creating abstract machine representation.
| AMViewRef crtamv_ ( | AMRef | *AMRefPtr |
Характеристики
Тип файла документ
Документы такого типа открываются такими программами, как Microsoft Office Word на компьютерах Windows, Apple Pages на компьютерах Mac, Open Office - бесплатная альтернатива на различных платформах, в том числе Linux. Наиболее простым и современным решением будут Google документы, так как открываются онлайн без скачивания прямо в браузере на любой платформе. Существуют российские качественные аналоги, например от Яндекса.
Будьте внимательны на мобильных устройствах, так как там используются упрощённый функционал даже в официальном приложении от Microsoft, поэтому для просмотра скачивайте PDF-версию. А если нужно редактировать файл, то используйте оригинальный файл.
Файлы такого типа обычно разбиты на страницы, а текст может быть форматированным (жирный, курсив, выбор шрифта, таблицы и т.п.), а также в него можно добавлять изображения. Формат идеально подходит для рефератов, докладов и РПЗ курсовых проектов, которые необходимо распечатать. Кстати перед печатью также сохраняйте файл в PDF, так как принтер может начудить со шрифтами.















