[PDF]PCIe cards - Rackcdn.comhttps://146a55aca6f00848c565-a7635525d40ac1c70300198708936b4e.ssl.cf1.rackcdn.com ›...
3 downloads
240 Views
2MB Size
HPC & GPU/FPGA Technology
Leverage OCP Design Advantages on EIA 19” Accelerator Server Gregary Liu, Product Director, Wiwynn Corporation
Agenda • • • • • •
Brief System Overview High CFM/watt Thermal Efficiency Flexible & Easy Design for Different Applications Design for Serviceability Design Extension to ORv2 Power Distribution Design for ORv2
HPC
Specifications
Agenda • • • • • •
Brief System Overview High CFM/watt Thermal Efficiency Flexible & Easy Design for Different Applications Design for Serviceability Design Extension to ORv2 Power Distribution Design for ORv2
HPC
Specifications
Brief System Overview – I • System Design Advantages • High CFM/watt thermal design for large-scale simulation models and DL training at all workloads
• By selecting different PCIe Topologies and PCIe cards, various different applications can be addressed
• OCP Related Design Highlights • Front IO Access • Tool-less ME design for labor-saving • Integrated field proven Mt. Olympus M/B for high quality assurance
Brief System Overview – II • EIA 19” Design Highlights • Standard 4RU High-Power Server design • Designed for 8 double-width PCIe G3 x16 slots adopt to various accelerators for different workloads
• Dual-Zone thermal/cooling design Cold air run through PCIe card directly
• CRPS PSU 2+2 Power redundancy • Scalable design easily migrated to ORv2 So, how do we achieve them?
Accelerator Server Basics PCIe card x1
2+2 Redundant PSU
x4 SW IC
SW IC
x4
Switch Board
Power Board
Fan Board
3+1 Redundant Fan
Agenda • • • • • •
Brief System Overview High CFM/watt Thermal Efficiency Flexible & Easy Design for Different Applications Design for Serviceability Design Extension to ORv2 Power Distribution Design for ORv2
HPC
Specifications
High CFM/watt Thermal Efficiency – I Two isolated cooling zones enable cold air run through (Side view) PCIe cards directly PCIe cards cooling zone
Server board cooling zone
•
Thermal efficiency
Cold air
Cold air
Cold air
0.135 CFM/watt, at 30°C
Cold air
0.117 CFM/watt, at 25°C
Cold air
Exceed DC requirement
Cold air
GPGPU
Hot air
Cold air
Cold air
Hot air
Fan
PSU
Hot air Hot air (Top view)
High CFM/watt Thermal Efficiency – II 3+1 System Fan Redundant design for up to 2.8KW workload @ 35ºC
CPU 84 C 134.7W DIMM 56 C
74 C / 247 W 73 C / 248 W
76 C / 248 W 76 C / 247 W 77 C / 246 W 77 C / 248 W
PSU inlet 41.8 C
35°C Inlet
SSD 40.1C
73 C / 247 W
GPU outlet 52.5°C
CPU 82 C 134.7W
GPU inlet 35.3°C
73 C / 248 W
48°C Outlet Location of failed fan
Agenda • • • • • •
Brief System Overview High CFM/watt Thermal Efficiency Flexible & Easy Design for Different Applications Design for Serviceability Design Extension to ORv2 Power Distribution Design for ORv2
HPC
Specifications
Flexible & Easy Design for Different Applications – I Project Olympus server board 2P Intel Xeon-SP UPI
CPU 0
CPU-PCIe Cards Topology 1 – Balance Mode CPU:GPU = 1:4
1* PCIe x16 CPU 1
Higher bandwidth between CPU and GPU.
PCIe x16 cable
PCIe3 Switch 0
PCIe3 Switch 1 GPU4-7
GPU0-3
GPU0
GPU1
GPU2
GPU3
GPU4
GPU5
GPU6
GPU7
PCIe Gen3 Switch Board
12
Flexible & Easy Design for Different Applications – II Project Olympus server board 2P Intel Xeon-SP UPI
CPU 0
CPU-PCIe Cards Topology 2 – Cascade Mode CPU:GPU = 1:8
2* PCIe x16 CPU 1
Peer to Peer performance can be extended to 8 PCIe cards
PCIe x16 cable
PCIe3 Switch 0
PCIe3 Switch 1 GPU4-7
GPU0-3
GPU0
GPU1
GPU2
GPU3
GPU4
GPU5
PCIe Gen3 Switch Board
GPU6
GPU7
Agenda • • • • • •
Brief System Overview High CFM/watt Thermal Efficiency Flexible & Easy Design for Different Applications Design for Serviceability Design Extension to ORv2 Power Distribution Design for ORv2
HPC
Specifications
Tool-less Design for PCIe Cards Maintenance – I Module and tool-less design for DW PCIe maintenance
GPU tray for Serviceability
Quarter turn fasten / release
•Modular SW tray for easy DW PCIe cards swap •Using quarter-turn fastener for PCIe cards replacement 15
Tool-less Design for PCIe Cards Maintenance – II Rotatable SSD bracket for PCIe card maintenance
SSD module and Rotational bracket
•Tool-less design •Prevents interference on serviceability on M/B Front PCIe Card Maintenance
16
Serviceability Design for Fan and SSD Replacement Modularized and labor-saving design •Hot plug fan module with labor-saving handle for fast replacement •Hot plug, front access SSDs are tool-less design SSD serviceability
Fan serviceability Fan cage
SSD Carrier
Labor saving handle
Agenda • • • • • •
Brief System Overview High CFM/watt Thermal Efficiency Flexible & Easy Design for Different Applications Design for Serviceability Design Extension to ORv2 Power Distribution Design for ORv2
HPC
Specifications
Design Extension to ORv2 • •
•
Retrofit to 4OU chassis to fit for ORv2 supporting 12V DC busbar Redesign PTB for power transition to server board and PCIe switch board Support up to 8x SATA SSDs
Processor
2S Intel® Xeon® Processor Scalable Family
DIMM
1.5TB DDR4; up to 2666 MT/s; 24 DIMM slots Drive support
8 x 2.5” hot plug SATA HDDs/SSDs
M.2 SSD Module
4 onboard M.2 modules
Accelerator
PCIe 3.0 slot
8, GPU/FPGA/Flash add-in cards
Expansion Slot
PCIe Gen3 (x16)
3, (1 or 2 reserved for GPU connection)
Storage
System Dimensions (mm)
4OU; 188 (H) x 537 (W) x 879 (D)
Agenda • • • • • •
Brief System Overview High CFM/watt Thermal Efficiency Flexible & Easy Design for Different Applications Design for Serviceability Design Extension to ORv2 Power Distribution Design for ORv2
HPC
Specifications
Power Distribution Design for ORv2 • Dual Busbar Clips to support up to 2.8KW • Power transition board (PTB) for MB, Switch board, Fan board
Bus Clip 1 12V
Bus Clip 2 12V
GPGPU Cards x8
12V
PTB
Mt. Olympus 12V
PCIe Switch Board 12V
Fan Board
Q&A