[PDF]Event - Rackcdn.comhttps://6c63f757edaa796d26dd-cfb6beb4f6586768cff9b45e916d7da1.ssl.cf2.rackcdn...
112 downloads
333 Views
1MB Size
Storage Performance Development Kit (SPDK) Daniel Verkamp, Software Engineer
Agenda Threading model discussion
SPDK Environment Layer SPDK Application Framework SPDK Blockdev Layer
SPDK Example Apps 2
Motivation: Performance via Concurrency Modern CPUs provide many cores
Core 0
Core 1
…
Core N
Modern I/O devices provide many independent queues
I/O Device
I/O Device
…
I/O Device
Goal: Architect software to match the hardware
3
Context switching and interrupts OS-provided multitasking was important on single-core machines Modern machines have many cores Instead of context switching, dedicate core(s) to specific tasks
Avoid interrupt handler overhead and latency by polling Instead of locks, pass messages
4
Threading Model Options Model
Pros
Cons
Example
One connection per thread with blocking I/O
Simple programming model
Interrupt driven High memory overhead
Apache worker MPM
Many connections per Low memory overhead thread with I/O event Less context switching multiplexing (select(), …)
Interrupt driven Inefficient polling
Apache event MPM, nginx, libuv, …
Many connections per thread with polled asynchronous I/O
More complex programming model
SPDK
Low memory overhead No interrupts No context switching
5
What threading model does SPDK target? Asynchronous polled I/O
6
Why an environment abstraction? Flexibility for user
8
Environment abstraction • Memory allocation (pinned for DMA) and address translation • PCI enumeration and resource mapping • Thread startup (pinned to cores)
• Lock-free ring and memory pool data structures
env.h init.c pci.c pci_ioat.c pci_nvme.c vtophys.c
9
Environment abstraction Configurable:
./configure --with-env=... Interface defined in spdk/env.h Default implementation uses DPDK (lib/env_dpdk)
env.h init.c pci.c pci_ioat.c pci_nvme.c vtophys.c
lib/env_dpdk
Flexibility: Decoupling and DPDK enhancements
10
How do we combine SPDK components? The SPDK app Framework provides the glue
12
Application Framework Builds on the environment abstraction
Example of how to glue other SPDK components together Libraries (lib/*) vs. applications (app/*)
event.h app.c reactor.c subsystem.c
lib/event
13
App Framework Components Reactor Poller
Event I/O Channel 14
Reactor
Core 0
Core 1
Core N
Events
Events
Events
Reactor 0
Reactor 1
Event loop (essentially a scheduler) Pinned to a specific CPU core Polls I/O devices
…
Reactor N
Polls event ring
I/O Device
…
…
Poller
Poller
…
Poller
Poller
I/O Device
Poller
Communication via event passing
…
I/O Device
15
Poller
Core 0
Core 1
Core N
Events
Events
Events
Reactor 0
Reactor 1
I/O Device
…
…
Reactor N
Poller
Poller
Poller
Poller
Poller I/O Device
…
…
…
I/O Device
16
Poller
Submit I/O
I/O completion callback
Essentially a “task” running on a reactor
Poller
Primarily checks hardware for async events Can run periodically on a timer
Example: poll completion queue
SQ
CQ
Callback runs to completion on reactor thread Completion handler may send an event
I/O Device
17
Event
Core 0
Core 1
Core N
Events
Events
Events
Reactor 0
Reactor 1
I/O Device
…
…
Reactor N
Poller
Poller
Poller
Poller
Poller I/O Device
…
…
…
I/O Device
18
Event Cross-thread communication
Events
Function pointer + arguments One-shot message passed between reactors Multi-producer/single-consumer ring Runs to completion on reactor thread
Reactor A
Reactor B
Poller Allocate and call event
Execute and free event
19
I/O Channel Abstracts hardware I/O queues Register I/O devices Create I/O channel per thread/device combination Provides hooks for driver resource allocation
I/O Device
I/O channel creation drives poller creation Pervasive in SPDK
20
Block Device Layer Block device driver abstraction Async read, write, flush, deallocate SGL support (readv/writev) I/O channel integration Layering (virtual blockdevs)
bdev.h bdev.c vbdev_split.c blockdev_aio.c blockdev_nvme.c blockdev_malloc.c blockdev_rbd.c
lib/bdev
22
Bdev Drivers NVMe* (local, remote) Malloc (RAM disk) Linux libaio Ceph RBD Potential future work: pmem (NVML)
bdev.h bdev.c vbdev_split.c blockdev_aio.c blockdev_nvme.c blockdev_malloc.c blockdev_rbd.c
lib/bdev
*Other names and brands may be claimed as the property of others.
23
Bdev Layering Virtual blockdev drivers
Bdev API
Claim base bdev(s) Produce virtual bdev(s)
Virtual bdev
Provide storage services Example: vbdev_split Coming soon: Blob bdev
Bdev API
Base bdev
24
NVMe over Fabrics Target Example Acceptor network poller handles connect events
Events
Connect Event
Events
Connection event registers new poller Reactor 0
Acceptor Poller
Reactor 1 Connection Poller
Network Connection 26
NVMe over Fabrics Target Example Acceptor network poller handles connect events
Events
Connect Event
Events
Connection event registers new poller I/O request arrives over network I/O submitted to storage Storage device poller checks completions Response sent
All asynchronous work is driven by pollers
Reactor 0
Acceptor Poller
Network
Reactor 1 Connection Poller
Storage Poller
Storage
I/O 27
vhost-scsi Example
Events
VM guest adds task to shared-memory queue
Reactor 0
Task retrieved from queue and passed to SCSI I/O submitted to storage
Queue Poller
SCSI
Storage Poller
Storage poller completes I/O SCSI layer signals completion by sending an event Event completes I/O back to VM
Virtio Queue
Storage
VM Guest I/O 28
Software design follows from hardware capabilities
Building blocks to manage asynchronous I/O
Swappable environment abstraction
Notices and Disclaimers Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Learn more at intel.com, or from the OEM or retailer.
No computer system can be absolutely secure. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit http://www.intel.com/performance. Intel, the Intel logo, Xeon, and others are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. © 2017 Intel Corporation.