PCIe cards


[PDF]PCIe cards - Rackcdn.comhttps://146a55aca6f00848c565-a7635525d40ac1c70300198708936b4e.ssl.cf1.rackcdn.com ›...

3 downloads 240 Views 2MB Size

HPC & GPU/FPGA Technology

Leverage OCP Design Advantages on EIA 19” Accelerator Server Gregary Liu, Product Director, Wiwynn Corporation

Agenda • • • • • •

Brief System Overview High CFM/watt Thermal Efficiency Flexible & Easy Design for Different Applications Design for Serviceability Design Extension to ORv2 Power Distribution Design for ORv2

HPC

Specifications

Agenda • • • • • •

Brief System Overview High CFM/watt Thermal Efficiency Flexible & Easy Design for Different Applications Design for Serviceability Design Extension to ORv2 Power Distribution Design for ORv2

HPC

Specifications

Brief System Overview – I • System Design Advantages • High CFM/watt thermal design for large-scale simulation models and DL training at all workloads

• By selecting different PCIe Topologies and PCIe cards, various different applications can be addressed

• OCP Related Design Highlights • Front IO Access • Tool-less ME design for labor-saving • Integrated field proven Mt. Olympus M/B for high quality assurance

Brief System Overview – II • EIA 19” Design Highlights • Standard 4RU High-Power Server design • Designed for 8 double-width PCIe G3 x16 slots  adopt to various accelerators for different workloads

• Dual-Zone thermal/cooling design  Cold air run through PCIe card directly

• CRPS PSU  2+2 Power redundancy • Scalable design  easily migrated to ORv2 So, how do we achieve them?

Accelerator Server Basics PCIe card x1

2+2 Redundant PSU

x4 SW IC

SW IC

x4

Switch Board

Power Board

Fan Board

3+1 Redundant Fan

Agenda • • • • • •

Brief System Overview High CFM/watt Thermal Efficiency Flexible & Easy Design for Different Applications Design for Serviceability Design Extension to ORv2 Power Distribution Design for ORv2

HPC

Specifications

High CFM/watt Thermal Efficiency – I Two isolated cooling zones enable cold air run through (Side view) PCIe cards directly PCIe cards cooling zone

Server board cooling zone



Thermal efficiency

Cold air

Cold air

Cold air

0.135 CFM/watt, at 30°C

Cold air

0.117 CFM/watt, at 25°C

Cold air

 Exceed DC requirement

Cold air

GPGPU

Hot air

Cold air

Cold air

Hot air

Fan

PSU

Hot air Hot air (Top view)

High CFM/watt Thermal Efficiency – II 3+1 System Fan Redundant design for up to 2.8KW workload @ 35ºC

CPU 84 C 134.7W DIMM 56 C

74 C / 247 W 73 C / 248 W

76 C / 248 W 76 C / 247 W 77 C / 246 W 77 C / 248 W

PSU inlet 41.8 C

35°C Inlet

SSD 40.1C

73 C / 247 W

GPU outlet 52.5°C

CPU 82 C 134.7W

GPU inlet 35.3°C

73 C / 248 W

48°C Outlet Location of failed fan

Agenda • • • • • •

Brief System Overview High CFM/watt Thermal Efficiency Flexible & Easy Design for Different Applications Design for Serviceability Design Extension to ORv2 Power Distribution Design for ORv2

HPC

Specifications

Flexible & Easy Design for Different Applications – I Project Olympus server board 2P Intel Xeon-SP UPI

CPU 0

CPU-PCIe Cards Topology 1 – Balance Mode CPU:GPU = 1:4

1* PCIe x16 CPU 1

Higher bandwidth between CPU and GPU.

PCIe x16 cable

PCIe3 Switch 0

PCIe3 Switch 1 GPU4-7

GPU0-3

GPU0

GPU1

GPU2

GPU3

GPU4

GPU5

GPU6

GPU7

PCIe Gen3 Switch Board

12

Flexible & Easy Design for Different Applications – II Project Olympus server board 2P Intel Xeon-SP UPI

CPU 0

CPU-PCIe Cards Topology 2 – Cascade Mode CPU:GPU = 1:8

2* PCIe x16 CPU 1

Peer to Peer performance can be extended to 8 PCIe cards

PCIe x16 cable

PCIe3 Switch 0

PCIe3 Switch 1 GPU4-7

GPU0-3

GPU0

GPU1

GPU2

GPU3

GPU4

GPU5

PCIe Gen3 Switch Board

GPU6

GPU7

Agenda • • • • • •

Brief System Overview High CFM/watt Thermal Efficiency Flexible & Easy Design for Different Applications Design for Serviceability Design Extension to ORv2 Power Distribution Design for ORv2

HPC

Specifications

Tool-less Design for PCIe Cards Maintenance – I Module and tool-less design for DW PCIe maintenance

GPU tray for Serviceability

Quarter turn fasten / release

•Modular SW tray for easy DW PCIe cards swap •Using quarter-turn fastener for PCIe cards replacement 15

Tool-less Design for PCIe Cards Maintenance – II Rotatable SSD bracket for PCIe card maintenance

SSD module and Rotational bracket

•Tool-less design •Prevents interference on serviceability on M/B Front PCIe Card Maintenance

16

Serviceability Design for Fan and SSD Replacement Modularized and labor-saving design •Hot plug fan module with labor-saving handle for fast replacement •Hot plug, front access SSDs are tool-less design SSD serviceability

Fan serviceability Fan cage

SSD Carrier

Labor saving handle

Agenda • • • • • •

Brief System Overview High CFM/watt Thermal Efficiency Flexible & Easy Design for Different Applications Design for Serviceability Design Extension to ORv2 Power Distribution Design for ORv2

HPC

Specifications

Design Extension to ORv2 • •



Retrofit to 4OU chassis to fit for ORv2 supporting 12V DC busbar Redesign PTB for power transition to server board and PCIe switch board Support up to 8x SATA SSDs

Processor

2S Intel® Xeon® Processor Scalable Family

DIMM

1.5TB DDR4; up to 2666 MT/s; 24 DIMM slots Drive support

8 x 2.5” hot plug SATA HDDs/SSDs

M.2 SSD Module

4 onboard M.2 modules

Accelerator

PCIe 3.0 slot

8, GPU/FPGA/Flash add-in cards

Expansion Slot

PCIe Gen3 (x16)

3, (1 or 2 reserved for GPU connection)

Storage

System Dimensions (mm)

4OU; 188 (H) x 537 (W) x 879 (D)

Agenda • • • • • •

Brief System Overview High CFM/watt Thermal Efficiency Flexible & Easy Design for Different Applications Design for Serviceability Design Extension to ORv2 Power Distribution Design for ORv2

HPC

Specifications

Power Distribution Design for ORv2 • Dual Busbar Clips to support up to 2.8KW • Power transition board (PTB) for MB, Switch board, Fan board

Bus Clip 1 12V

Bus Clip 2 12V

GPGPU Cards x8

12V

PTB

Mt. Olympus 12V

PCIe Switch Board 12V

Fan Board

Q&A