Лекция 2. Intel technologies for HPC Applications (Semin) (1186100)

Файл №1186100 Лекция 2. Intel technologies for HPC Applications (Semin) (Электронные лекции)Лекция 2. Intel technologies for HPC Applications (Semin) (1186100)2020-08-252020-08-25СтудИзба

Электронные лекции

Просмтор этого файла доступен только зарегистрированным пользователям. Но у нас супер быстрая регистрация: достаточно только электронной почты!

Регистрация/авторизация

Текст из файла

Intel Technologies forHigh Performance ComputingApplicationsAndrey SeminPrincipal EngineerSoftware and Services GroupSeptember 7, 2016To Compete, You Must Compute! ** Susan Baldwin, Executive Director of Compute CanadaLegal InformationINFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS”. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTSIS GRANTED BY THIS DOCUMENT.

INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THISINFORMATION INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT,COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, aremeasured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary.

You should consult otherinformation and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products.Intel product plans in this presentation do not constitute Intel plan of record product roadmaps. Please contact your Intel representative to obtain Intel's current plan of record productroadmaps.Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors.

These optimizations includeSSE2, SSE3, and SSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors notmanufactured by Intel.Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors.

Certain optimizations not specific to Intel microarchitecture are reserved for Intelmicroprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.Notice revision #20110804All products, computer systems, dates, and figures specified are preliminary based on current expectations, and are subject to change without notice.Intel processor numbers are not a measure of performance. Processor numbers differentiate features within each processor family, not across different processor families. Go to:http://www.intel.com/products/processor_numberIntel, processors, chipsets, and desktop boards may contain design defects or errors known as errata, which may cause the product to deviate from published specifications. Currentcharacterized errata are available on request.Intel, Intel Xeon, Intel Xeon Phi, Intel Hadoop Distribution, Intel Cluster Ready, Intel OpenMP, Intel Cilk Plus, Intel Threaded Building blocks, Intel Cluster Studio, Intel Parallel Studio, IntelCoarray Fortran, Intel Math Kernel Library, Intel Enterprise Edition for Lustre Software, Intel Composer, the Intel Xeon Phi logo, the Intel Xeon logo and the Intel logo are trademarks orregistered trademarks of Intel Corporation or its subsidiaries in the United States and other countries.Intel does not control or audit the design or implementation of third party benchmark data or Web sites referenced in this document.

Intel encourages all of its customers to visit thereferenced Web sites or others where similar performance benchmark data are reported and confirm whether the referenced benchmark data are accurate and reflect performance ofsystems available for purchase.Other names, brands , and images may be claimed as the property of others.Copyright © 2016, Intel Corporation.

All rights reserved.Agenda• Demand for high performance computing• Intel computing architectures for HPC• Cores: pipelines, execution units• AVX-512 overviewThe Three Pillars of ModernScience, Research & EngineeringExperiment,ObservationTheoryNumericalSimulationHigh Performance Computing:A Fundamental Tool for BreakthroughsGovernment & AcademiaMolecularDynamicsCuring DiseaseNon-InvasiveDiagnosticsWeatherPredictionDiscoveryCommercial/IndustrialCrash Test SimulationFinancialTradingCFDBusiness TransformationTo Compete You Must ComputeNew Users – New UsesDeep learningDataAnalyticsMachinelearningMaking insightsSource: www.top500.orgNeed for SpeedIncreasing Processor PerformanceFLOPS/ProcessorPerformanceMany-CoreTera-Scale R&DTFLOPSMulti-CorePentium® 4 ArchitecturePentium® IIIArchitecture386486Intel® Core™ uArchPentium® II ArchitecturePentium®ArchitectureFuture options subject to change without notice.

Source: IntelTimeestimatedFor illustration only, not drawn to scale. All dates, product descriptions, features, availability, and plans are forecasts and subject to change withoutnotice.„Big Core“ – „Small Core“Different Optimization PointsCommon Programming Modelsand Architectural ElementsIntel® Xeon® ProcessorIntel® Xeon Phi™ ProcessorSimply aggregating more cores generation aftergeneration is not sufficientOptimized for highest compute per wattPerformance per core/thread must increase eachgeneration, be as fast as possibleWilling to trade performance per core/thread foraggregate performancePower envelopes should stay flat or go down eachgenerationPower envelopes should also stay flat or go downevery generationBalanced platform (Memory, I/O, Compute)Optimized for highly parallel workloadsCores, Threads, Caches, SIMDCores, Threads, Caches, SIMDFor illustration onlyParallel is the Path ForwardIntel® Xeon® and Intel® Xeon Phi™ Product Families are both going parallelIntel® Xeon®processor5100 seriesIntel®Intel®Intel® Xeon® E5- Intel® Xeon®®®XeonXeon2600 processor E5-2600 v2processorprocessorcode-namedprocessor5500 series 5600 series Sandy Bridgecode-namedEPIvy Bridge EPIntel® Xeon® Intel® Xeon®E5-2600 v3 E5-2600 v4processorprocessorcode-named code-namedHaswell EPBroadwellEPIntel® XeonIntel® XeonPhi™Phi™ processorcoprocessorcode-namedcode-namedKnightsKnights CornerLanding6172Core(s) up to2468121822244288Threads up to281216243644512512SIMD Width (bits)128128128256256256256More Cores  More Threads  Wider VectorsPotential future options subject to change without notice.

Codenames.All timeframes, features, products and dates are preliminary forecasts and subject to change without further notification.Product specification for launched and shipped products available on ark.intel.com.(die sizes not to scale, for illustration only)Knights Corner Architecture OverviewFeatures of an Individual CoreInstruction DecodeScalarUnitVectorUnitScalarRegistersVectorRegisters32K L1 I-cache32K L1 D-cache• Up to 61 in-order cores• 4 hardware threads per core• Two pipelines– Pentium® processor family-based scalar units– Fully-coherent L1 and L2 caches– 64-bit addressing• All new vector unit– 512-bit SIMD Instructions – not Intel® SSE, MMX™, orIntel® AVX– 32x 512-bit wide vector registers256K L2 CacheRing– Hold 16 singles or 8 doubles per register– Pipelined one-per-clock throughput– 4 clock latency, hidden by round-robin scheduling ofthreads– Dual issue with scalar instructionsVector/SIMD High Computational DensityMask RegistersInstruction Decode16-wide Vector ALUScalarUnitVectorUnitScalarRegistersVectorRegistersReplicateVectorRegisters32K L1 I-cache32K L1 D-cache256K L2 CacheRingReorderNumericConvertNumericConvertL1 Data CacheVector/SIMD UnitKnights Landing Core & VPU• Out-of-order core w/ 4 SMT threads: 3x over KNC• VPU tightly integrated with core pipeline• 2-wide Decode/Rename/Retire• ROB-based renaming.

72-entry ROB & Rename Buffers• Up to 6-wide at execution• Integer (Int) and floating point (FP) RS are OoO• MEM RS in-order with OoO completion - Recycle Bufferholds memory ops waiting for completion• Int and MEM RS hold source data, FP RS does not• 2x 64B Load & 1x 64B Store ports in Dcache• 1st level uTLB: 64 entries•••••2nd level dTLB: 256 4K, 128 2M, 16 1G pagesL1 Prefetcher (IPP) and L2 Prefetcher46/48 PA/VA bitsFast unaligned and cache-line split supportFast Gather/Scatter supportHaswell/Broadwell Core Microarchitecture32K L1 Instruction CacheInstructionPre decodeBranch PredLoadBuffers1.5k uOP cacheReorderBuffersStoreBuffersDecodersDecodersDecodersQueueAllocate/Rename/RetireIdiom EliminationIn orderSchedulerVector Int ALUVector LogicalsVector LogicalsIntegerALU & LEAVectorShuffleVector Int ALUIntegerALU & ShiftBranchVector LogicalsBranchDivideVector ShiftsMemory ControlL2 Data Cache (MLC)HSW - Intel® Next Generation MicroarchitectureFill Buffers96 bytes/cycle32k L1 Data CacheAVX= Intel® Advanced Vector Extensions (Intel® AVX)Port 7Vector Int MultiplyPort6FMA + FP MultFP AddStoreDataPort 5FMAFP MultiplyLoad &Store AddressPort4Integer ALU & Shift Integer ALU & LEAPort 3Port 2Port1Port0`StoreAddressOut-oforderThe Effect of SIMD (Single Core)Maximum Attainable Peak Performance[GFLOPS]Based on Amdahl’s LawSimplified and for illustration only48 GFLOPS [DPF.P.]35 GFLOPSXeon E5-2699 v4,2.2 GHz (1 core)Xeon Phi 72901.5GHz (1 core)%SIMD/VECTORMaximum possible speedup1 Xeon Phi 7290 vs.

2 socket Xeon E5-2699 v4 (2.2GHz, 22 cores)4,00-4,504,504,003,50-4,003,50Simplified and for illustration only3,00-3,503,002,502,50-3,002,002,00-2,501,501,000,50TheoreticalPeak Performancespeedup usingAmdahl’s Law0,000%10%20%30%40%50%60%70%80%1.000.900.800.700.600.500.400.300.1090%0.00100%1,50-2,001,00-1,500.200,50-1,000,00-0,50Notice: This document contains information on products in the design phase of development. Theinformation here is subject to change without notice. Do not finalize a design with this information.Contact your local Intel sales office or your distributor to obtain the latest specification beforeplacing your product order.Knights Corner and other code names featured are used internally within Intel to identify productsthat are in development and not yet publicly announced for release.

Customers, licensees and otherthird parties are not authorized by Intel to use code names in advertising, promotion or marketing ofany product or services and any such use of Intel's internal code names is at the sole risk of theuser.

Характеристики

Тип файла

PDF-файл

Размер

4,24 Mb

Материал

Электронные лекции

Тип материала

Лекции

Предмет

Суперкомпьютерное моделирование и технологии

Высшее учебное заведение

МГУ им. Ломоносова

Тип файла PDF

PDF-формат наиболее широко используется для просмотра любого типа файлов на любом устройстве. В него можно сохранить документ, таблицы, презентацию, текст, чертежи, вычисления, графики и всё остальное, что можно показать на экране любого устройства. Именно его лучше всего использовать для печати.

Например, если Вам нужно распечатать чертёж из автокада, Вы сохраните чертёж на флешку, но будет ли автокад в пункте печати? А если будет, то нужная версия с нужными библиотеками? Именно для этого и нужен формат PDF - в нём точно будет показано верно вне зависимости от того, в какой программе создали PDF-файл и есть ли нужная программа для его просмотра.

Список файлов лекций

jelektronnye-lekcii.rar

Электронные лекции 2016 года

Лекция 7. Диффуры. Разбиение сетки

Lection_Pictures

Puasson_Serial_ECGM_120x90.pdf

Puasson_Serial_SDI_120x90.pdf

MPI_Decomposition.c

Puasson_Serial.c

Puasson_Serial_MeshGen.c

Лекция 1. Суперкомпьютерное моделирование. Структура алгоритмов.pdf

Лекция 2. Intel technologies for HPC Applications (Semin).pdf

Лекция 3. Кластеры и суперкомпьютеры.pdf

Лекция 4. Организация работы пользователей на суперкомп. ВМК МГУ.pdf

Лекция 5. Распределение данных. Основы MPI.pdf

Лекция 6. Вычислительные системы Regatta, Bluegene.pdf

Лекция. CUDA 1 (Колганов).pdf

Лекция. CUDA 2 (Колганов).pdf

Лекция. CUDA 3 (Колганов).pdf

Лекция. HPC - High Perfrmance Computing (Перевозчиков).pdf

Лекция. Квантовый компьютер (Ожигов).pdf

Лекция. Матрицы, тензоры, вычисления (Тыртышников).pdf

Лекция. Молекулярная динамика (Шумкин).pdf

Лекция. Турбулентные течения 1 (Головизин).pdf

Лекция. Турбулентные течения 2 (Головизин).pdf

Лекция. Турбулентные течения 3 (Головизин).pdf

Поделитесь ссылкой:

Ставлю 10/10
Все нравится, очень удобный сайт, помогает в учебе. Кроме этого, можно заработать самому, выставляя готовые учебные материалы на продажу здесь. Рейтинги и отзывы на преподавателей очень помогают сориентироваться в начале нового семестра. Спасибо за такую функцию. Ставлю максимальную оценку.

Лучшая платформа для успешной сдачи сессии
Познакомился со СтудИзбой благодаря своему другу, очень нравится интерфейс, количество доступных файлов, цена, в общем, все прекрасно. Даже сам продаю какие-то свои работы.

Студизба ван лав ❤
Очень офигенный сайт для студентов. Много полезных учебных материалов. Пользуюсь студизбой с октября 2021 года. Серьёзных нареканий нет. Хотелось бы, что бы ввели подписочную модель и сделали материалы дешевле 300 рублей в рамках подписки бесплатными.

Отличный сайт
Лично меня всё устраивает - и покупка, и продажа; и цены, и возможность предпросмотра куска файла, и обилие бесплатных файлов (в подборках по авторам, читай, ВУЗам и факультетам). Есть определённые баги, но всё решаемо, да и администраторы реагируют в течение суток.

Маленький отзыв о большом помощнике!
Студизба спасает в те моменты, когда сроки горят, а работ накопилось достаточно. Довольно удобный сайт с простой навигацией и огромным количеством материалов.

Студ. Изба как крупнейший сборник работ для студентов
Тут дофига бывает всего полезного. Печально, что бывают предметы по которым даже одного бесплатного решения нет, но это скорее вопрос к студентам. В остальном всё здорово.

Спасательный островок
Если уже не успеваешь разобраться или застрял на каком-то задание поможет тебе быстро и недорого решить твою проблему.

Всё и так отлично
Всё очень удобно. Особенно круто, что есть система бонусов и можно выводить остатки денег. Очень много качественных бесплатных файлов.

Отзыв о системе "Студизба"
Отличная платформа для распространения работ, востребованных студентами. Хорошо налаженная и качественная работа сайта, огромная база заданий и аудитория.

Отличный помощник
Отличный сайт с кучей полезных файлов, позволяющий найти много методичек / учебников / отзывов о вузах и преподователях.

Отлично помогает студентам в любой момент для решения трудных и незамедлительных задач
Хотелось бы больше конкретной информации о преподавателях. А так в принципе хороший сайт, всегда им пользуюсь и ни разу не было желания прекратить. Хороший сайт для помощи студентам, удобный и приятный интерфейс. Из недостатков можно выделить только отсутствия небольшого количества файлов.

Спасибо за шикарный сайт
Великолепный сайт на котором студент за не большие деньги может найти помощь с дз, проектами курсовыми, лабораторными, а также узнать отзывы на преподавателей и бесплатно скачать пособия.