pppa_guid-e (1158384), страница 3
Текст из файла (страница 3)
PARAMETER (L=1200, ITMAX=4)
REAL A(L,L), EPS, MAXEPS, B(L,L)
CHPF$ PROCESSORS P(2,2)
CHPF$ DISTRIBUTE A ( BLOCK, BLOCK) ONTO P
CHPF$ ALIGN B( I, J ) WITH A( I, J )
C arrays A and B with block distribution
PRINT *, '********** TEST_JACOBI **********'
MAXEPS = 0.5E - 7
CDVM$ PARALLEL (J,I) ON A(I, J)
C nest of two parallel loops, iteration (i,j) will be executed on
C processor, which is owner of element A(i,j)
DO 1 J = 1, L
DO 1 I = 1, L
A(I, J) = 0.
IF(I.EQ.1 .OR. J.EQ.1 .OR. I.EQ.L .OR. J.EQ.L) THEN
B(I, J) = 0.
ELSE
B(I, J) = ( 1. + I + J )
ENDIF
1 CONTINUE
DO 2 IT = 1, ITMAX
EPS = 0.
CDVM$ PARALLEL (J, I) ON A(I, J), REDUCTION ( MAX( EPS ))
C variable EPS is used for calculation of maximum value
DO 21 J = 2, L-1
DO 21 I = 2, L-1
EPS = MAX ( EPS, ABS( B( I, J) - A( I, J)))
A(I, J) = B(I, J)
21 CONTINUE
CDVM$ PARALLEL (J, I) ON B(I, J), SHADOW_RENEW (A)
C Copying shadow elements of array A from
C neighboring processors before loop execution
DO 22 J = 2, L-1
DO 22 I = 2, L-1
B(I, J) = (A( I-1, J ) + A( I, J-1 ) + A( I+1, J)+
* A( I, J+1 )) / 4
22 CONTINUE
PRINT *, 'IT = ', IT, ' EPS = ', EPS
IF ( EPS . LT . MAXEPS ) GO TO 3
2 CONTINUE
3 CONTINUE
C OPEN (3, FILE='JACOBI.DAT', FORM='FORMATTED')
C WRITE (3,*) B
C CLOSE (3)
END
INTERVAL ( NLINE=8 SOURCE=jac.fdv ) LEVEL=0 EXE_COUNT=1
--- Main characteristics ---
| Parallelization efficiency | 0.4952 |
( Real_sync= 3.1443; Starts= 0.0267 ) |
|
| Nop | Communic | Real_sync | Synchro | Variation | Overlap |
Note: there are only non-distributed data print operators in the program, this fact explains the absence of input/output communication losses. As such data have the same value on each processor such operations are executed by input/output processor without interprocessor exchanges.
--- Comparative characteristics ---
|
| T min | Npr | T max | Npr | T mid |
| Communic | Real_sync | Synchro | Variation | Overlap | ||
| I/O | Tmin | 0.0000 1 | 0.0000 1 | 0.0058 4 | 0.0066 4 | 0.0000 1 |
| Reduction | Tmin | 0.0303 4 | 0.0212 4 | 0.0231 4 | 0.0000 3 | 0.0000 2 |
| Shadow | Tmin | 0.0029 3 | 0.0000 2 | 0.0000 3 | 0.0000 3 | 0.0000 1 |
7Appendix. The list of characteristics
7.1Main characteristics and their components
-
Efficiency coefficient (Parallelization efficiency) is ratio of productive time to total processor time.
-
Time of execution (Execution time).
-
The number of used processors (Processors).
-
Total processor time (Total time) is production of the time of execution (Execution_time) by the number of used processors (Processors).
-
Productive time (Productive time) is the sum of productive processor time (CPU), input/output time (I/O) and productive system time (Sys).
-
Lost time (Lost time).
-
Insufficient parallelism (Insufficient par) and its components.
-
Communications and all components (Communication).
-
Idle (Idle time).
-
Imbalance (Load Imbalance).
-
Potential synchronization losses and all components (Synchronization).
-
Potential time variation losses and all components (Time variation).
-
Overlapping time and all components (Overlap).
7.2Characteristics of program execution on each processor
-
Lost time (Lost time) is the sum of insufficient parallelism losses (User Insufficient par), system insufficient parallelism losses (Sys Insufficient par), communications losses (Communication) and idle (Idle).
-
Insufficient parallelism losses (User insufficient par).
-
System insufficient parallelism losses (Sys insufficient par).
-
Time of losses because of the given processor idle (Idle time) is difference between maximal interval execution time (on any processor) and interval execution time on the given processor.
-
Total communication time (Communication).
-
Real time of losses because of dissynchrinization (Real synchronization).
-
Potential time of losses because of dissynchrinization (Synchronization).
-
Potential time of losses because of time variation (Variation).
-
Time of asynchronous operation overlapping (Overlap).
-
Losses because of load imbalance (Load Imbalance) is difference between maximal processor time (CPU + Sys) and the time on the given processor.
-
Time of interval execution (Execution time).
-
Productive processor time (User CPU time).
-
Productive system time (Sys CPU time)
-
Input/output time (I/O time).
-
Time of collective operation start (Start operation).
-
Number of processors used for interval (Processors).
-
Communication times for all types of collective operations (Reduction, Shadow, Remote access, Redistribution и I/O), besides times of their start.
-
Real dissynchronization losses for all types of collective operations.
-
Potential dissynchronization losses for all types of collective operations.
-
Potential time variation losses for all types of collective operations.
-
Time of overlapping for all collective operations (Overlap).















