Лекция 2. Intel technologies for HPC Applications (Semin) (Электронные лекции)
Описание файла
Файл "Лекция 2. Intel technologies for HPC Applications (Semin)" внутри архива находится в папке "Электронные лекции 2016 года". PDF-файл из архива "Электронные лекции", который расположен в категории "". Всё это находится в предмете "суперкомпьютерное моделирование и технологии" из 11 семестр (3 семестр магистратуры), которые можно найти в файловом архиве МГУ им. Ломоносова. Не смотря на прямую связь этого архива с МГУ им. Ломоносова, его также можно найти и в других разделах. .
Просмотр PDF-файла онлайн
Текст из PDF
Intel Technologies forHigh Performance ComputingApplicationsAndrey SeminPrincipal EngineerSoftware and Services GroupSeptember 7, 2016To Compete, You Must Compute! ** Susan Baldwin, Executive Director of Compute CanadaLegal InformationINFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS”. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTSIS GRANTED BY THIS DOCUMENT.
INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THISINFORMATION INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT,COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, aremeasured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary.
You should consult otherinformation and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products.Intel product plans in this presentation do not constitute Intel plan of record product roadmaps. Please contact your Intel representative to obtain Intel's current plan of record productroadmaps.Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors.
These optimizations includeSSE2, SSE3, and SSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors notmanufactured by Intel.Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors.
Certain optimizations not specific to Intel microarchitecture are reserved for Intelmicroprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.Notice revision #20110804All products, computer systems, dates, and figures specified are preliminary based on current expectations, and are subject to change without notice.Intel processor numbers are not a measure of performance. Processor numbers differentiate features within each processor family, not across different processor families. Go to:http://www.intel.com/products/processor_numberIntel, processors, chipsets, and desktop boards may contain design defects or errors known as errata, which may cause the product to deviate from published specifications. Currentcharacterized errata are available on request.Intel, Intel Xeon, Intel Xeon Phi, Intel Hadoop Distribution, Intel Cluster Ready, Intel OpenMP, Intel Cilk Plus, Intel Threaded Building blocks, Intel Cluster Studio, Intel Parallel Studio, IntelCoarray Fortran, Intel Math Kernel Library, Intel Enterprise Edition for Lustre Software, Intel Composer, the Intel Xeon Phi logo, the Intel Xeon logo and the Intel logo are trademarks orregistered trademarks of Intel Corporation or its subsidiaries in the United States and other countries.Intel does not control or audit the design or implementation of third party benchmark data or Web sites referenced in this document.
Intel encourages all of its customers to visit thereferenced Web sites or others where similar performance benchmark data are reported and confirm whether the referenced benchmark data are accurate and reflect performance ofsystems available for purchase.Other names, brands , and images may be claimed as the property of others.Copyright © 2016, Intel Corporation.
All rights reserved.Agenda• Demand for high performance computing• Intel computing architectures for HPC• Cores: pipelines, execution units• AVX-512 overviewThe Three Pillars of ModernScience, Research & EngineeringExperiment,ObservationTheoryNumericalSimulationHigh Performance Computing:A Fundamental Tool for BreakthroughsGovernment & AcademiaMolecularDynamicsCuring DiseaseNon-InvasiveDiagnosticsWeatherPredictionDiscoveryCommercial/IndustrialCrash Test SimulationFinancialTradingCFDBusiness TransformationTo Compete You Must ComputeNew Users – New UsesDeep learningDataAnalyticsMachinelearningMaking insightsSource: www.top500.orgNeed for SpeedIncreasing Processor PerformanceFLOPS/ProcessorPerformanceMany-CoreTera-Scale R&DTFLOPSMulti-CorePentium® 4 ArchitecturePentium® IIIArchitecture386486Intel® Core™ uArchPentium® II ArchitecturePentium®ArchitectureFuture options subject to change without notice.
Source: IntelTimeestimatedFor illustration only, not drawn to scale. All dates, product descriptions, features, availability, and plans are forecasts and subject to change withoutnotice.„Big Core“ – „Small Core“Different Optimization PointsCommon Programming Modelsand Architectural ElementsIntel® Xeon® ProcessorIntel® Xeon Phi™ ProcessorSimply aggregating more cores generation aftergeneration is not sufficientOptimized for highest compute per wattPerformance per core/thread must increase eachgeneration, be as fast as possibleWilling to trade performance per core/thread foraggregate performancePower envelopes should stay flat or go down eachgenerationPower envelopes should also stay flat or go downevery generationBalanced platform (Memory, I/O, Compute)Optimized for highly parallel workloadsCores, Threads, Caches, SIMDCores, Threads, Caches, SIMDFor illustration onlyParallel is the Path ForwardIntel® Xeon® and Intel® Xeon Phi™ Product Families are both going parallelIntel® Xeon®processor5100 seriesIntel®Intel®Intel® Xeon® E5- Intel® Xeon®®®XeonXeon2600 processor E5-2600 v2processorprocessorcode-namedprocessor5500 series 5600 series Sandy Bridgecode-namedEPIvy Bridge EPIntel® Xeon® Intel® Xeon®E5-2600 v3 E5-2600 v4processorprocessorcode-named code-namedHaswell EPBroadwellEPIntel® XeonIntel® XeonPhi™Phi™ processorcoprocessorcode-namedcode-namedKnightsKnights CornerLanding6172Core(s) up to2468121822244288Threads up to281216243644512512SIMD Width (bits)128128128256256256256More Cores More Threads Wider VectorsPotential future options subject to change without notice.
Codenames.All timeframes, features, products and dates are preliminary forecasts and subject to change without further notification.Product specification for launched and shipped products available on ark.intel.com.(die sizes not to scale, for illustration only)Knights Corner Architecture OverviewFeatures of an Individual CoreInstruction DecodeScalarUnitVectorUnitScalarRegistersVectorRegisters32K L1 I-cache32K L1 D-cache• Up to 61 in-order cores• 4 hardware threads per core• Two pipelines– Pentium® processor family-based scalar units– Fully-coherent L1 and L2 caches– 64-bit addressing• All new vector unit– 512-bit SIMD Instructions – not Intel® SSE, MMX™, orIntel® AVX– 32x 512-bit wide vector registers256K L2 CacheRing– Hold 16 singles or 8 doubles per register– Pipelined one-per-clock throughput– 4 clock latency, hidden by round-robin scheduling ofthreads– Dual issue with scalar instructionsVector/SIMD High Computational DensityMask RegistersInstruction Decode16-wide Vector ALUScalarUnitVectorUnitScalarRegistersVectorRegistersReplicateVectorRegisters32K L1 I-cache32K L1 D-cache256K L2 CacheRingReorderNumericConvertNumericConvertL1 Data CacheVector/SIMD UnitKnights Landing Core & VPU• Out-of-order core w/ 4 SMT threads: 3x over KNC• VPU tightly integrated with core pipeline• 2-wide Decode/Rename/Retire• ROB-based renaming.
72-entry ROB & Rename Buffers• Up to 6-wide at execution• Integer (Int) and floating point (FP) RS are OoO• MEM RS in-order with OoO completion - Recycle Bufferholds memory ops waiting for completion• Int and MEM RS hold source data, FP RS does not• 2x 64B Load & 1x 64B Store ports in Dcache• 1st level uTLB: 64 entries•••••2nd level dTLB: 256 4K, 128 2M, 16 1G pagesL1 Prefetcher (IPP) and L2 Prefetcher46/48 PA/VA bitsFast unaligned and cache-line split supportFast Gather/Scatter supportHaswell/Broadwell Core Microarchitecture32K L1 Instruction CacheInstructionPre decodeBranch PredLoadBuffers1.5k uOP cacheReorderBuffersStoreBuffersDecodersDecodersDecodersQueueAllocate/Rename/RetireIdiom EliminationIn orderSchedulerVector Int ALUVector LogicalsVector LogicalsIntegerALU & LEAVectorShuffleVector Int ALUIntegerALU & ShiftBranchVector LogicalsBranchDivideVector ShiftsMemory ControlL2 Data Cache (MLC)HSW - Intel® Next Generation MicroarchitectureFill Buffers96 bytes/cycle32k L1 Data CacheAVX= Intel® Advanced Vector Extensions (Intel® AVX)Port 7Vector Int MultiplyPort6FMA + FP MultFP AddStoreDataPort 5FMAFP MultiplyLoad &Store AddressPort4Integer ALU & Shift Integer ALU & LEAPort 3Port 2Port1Port0`StoreAddressOut-oforderThe Effect of SIMD (Single Core)Maximum Attainable Peak Performance[GFLOPS]Based on Amdahl’s LawSimplified and for illustration only48 GFLOPS [DPF.P.]35 GFLOPSXeon E5-2699 v4,2.2 GHz (1 core)Xeon Phi 72901.5GHz (1 core)%SIMD/VECTORMaximum possible speedup1 Xeon Phi 7290 vs.
2 socket Xeon E5-2699 v4 (2.2GHz, 22 cores)4,00-4,504,504,003,50-4,003,50Simplified and for illustration only3,00-3,503,002,502,50-3,002,002,00-2,501,501,000,50TheoreticalPeak Performancespeedup usingAmdahl’s Law0,000%10%20%30%40%50%60%70%80%1.000.900.800.700.600.500.400.300.1090%0.00100%1,50-2,001,00-1,500.200,50-1,000,00-0,50Notice: This document contains information on products in the design phase of development. Theinformation here is subject to change without notice. Do not finalize a design with this information.Contact your local Intel sales office or your distributor to obtain the latest specification beforeplacing your product order.Knights Corner and other code names featured are used internally within Intel to identify productsthat are in development and not yet publicly announced for release.
Customers, licensees and otherthird parties are not authorized by Intel to use code names in advertising, promotion or marketing ofany product or services and any such use of Intel's internal code names is at the sole risk of theuser.