Event


[PDF]Event - Rackcdn.comhttps://6c63f757edaa796d26dd-cfb6beb4f6586768cff9b45e916d7da1.ssl.cf2.rackcdn...

112 downloads 333 Views 1MB Size

Storage Performance Development Kit (SPDK) Daniel Verkamp, Software Engineer

Agenda  Threading model discussion

 SPDK Environment Layer  SPDK Application Framework  SPDK Blockdev Layer

 SPDK Example Apps 2

Motivation: Performance via Concurrency Modern CPUs provide many cores

Core 0

Core 1



Core N

Modern I/O devices provide many independent queues

I/O Device

I/O Device



I/O Device

Goal: Architect software to match the hardware

3

Context switching and interrupts OS-provided multitasking was important on single-core machines Modern machines have many cores Instead of context switching, dedicate core(s) to specific tasks

Avoid interrupt handler overhead and latency by polling Instead of locks, pass messages

4

Threading Model Options Model

Pros

Cons

Example

One connection per thread with blocking I/O

Simple programming model

Interrupt driven High memory overhead

Apache worker MPM

Many connections per Low memory overhead thread with I/O event Less context switching multiplexing (select(), …)

Interrupt driven Inefficient polling

Apache event MPM, nginx, libuv, …

Many connections per thread with polled asynchronous I/O

More complex programming model

SPDK

Low memory overhead No interrupts No context switching

5

What threading model does SPDK target? Asynchronous polled I/O

6

Why an environment abstraction? Flexibility for user

8

Environment abstraction • Memory allocation (pinned for DMA) and address translation • PCI enumeration and resource mapping • Thread startup (pinned to cores)

• Lock-free ring and memory pool data structures

env.h init.c pci.c pci_ioat.c pci_nvme.c vtophys.c

9

Environment abstraction Configurable:

./configure --with-env=... Interface defined in spdk/env.h Default implementation uses DPDK (lib/env_dpdk)

env.h init.c pci.c pci_ioat.c pci_nvme.c vtophys.c

lib/env_dpdk

Flexibility: Decoupling and DPDK enhancements

10

How do we combine SPDK components? The SPDK app Framework provides the glue

12

Application Framework Builds on the environment abstraction

Example of how to glue other SPDK components together Libraries (lib/*) vs. applications (app/*)

event.h app.c reactor.c subsystem.c

lib/event

13

App Framework Components Reactor Poller

Event I/O Channel 14

Reactor

Core 0

Core 1

Core N

Events

Events

Events

Reactor 0

Reactor 1

Event loop (essentially a scheduler) Pinned to a specific CPU core Polls I/O devices



Reactor N

Polls event ring

I/O Device





Poller

Poller



Poller

Poller

I/O Device

Poller

Communication via event passing



I/O Device

15

Poller

Core 0

Core 1

Core N

Events

Events

Events

Reactor 0

Reactor 1

I/O Device





Reactor N

Poller

Poller

Poller

Poller

Poller I/O Device







I/O Device

16

Poller

Submit I/O

I/O completion callback

Essentially a “task” running on a reactor

Poller

Primarily checks hardware for async events Can run periodically on a timer

Example: poll completion queue

SQ

CQ

Callback runs to completion on reactor thread Completion handler may send an event

I/O Device

17

Event

Core 0

Core 1

Core N

Events

Events

Events

Reactor 0

Reactor 1

I/O Device





Reactor N

Poller

Poller

Poller

Poller

Poller I/O Device







I/O Device

18

Event Cross-thread communication

Events

Function pointer + arguments One-shot message passed between reactors Multi-producer/single-consumer ring Runs to completion on reactor thread

Reactor A

Reactor B

Poller Allocate and call event

Execute and free event

19

I/O Channel Abstracts hardware I/O queues Register I/O devices Create I/O channel per thread/device combination Provides hooks for driver resource allocation

I/O Device

I/O channel creation drives poller creation Pervasive in SPDK

20

Block Device Layer Block device driver abstraction Async read, write, flush, deallocate SGL support (readv/writev) I/O channel integration Layering (virtual blockdevs)

bdev.h bdev.c vbdev_split.c blockdev_aio.c blockdev_nvme.c blockdev_malloc.c blockdev_rbd.c

lib/bdev

22

Bdev Drivers NVMe* (local, remote) Malloc (RAM disk) Linux libaio Ceph RBD Potential future work: pmem (NVML)

bdev.h bdev.c vbdev_split.c blockdev_aio.c blockdev_nvme.c blockdev_malloc.c blockdev_rbd.c

lib/bdev

*Other names and brands may be claimed as the property of others.

23

Bdev Layering Virtual blockdev drivers

Bdev API

Claim base bdev(s) Produce virtual bdev(s)

Virtual bdev

Provide storage services Example: vbdev_split Coming soon: Blob bdev

Bdev API

Base bdev

24

NVMe over Fabrics Target Example Acceptor network poller handles connect events

Events

Connect Event

Events

Connection event registers new poller Reactor 0

Acceptor Poller

Reactor 1 Connection Poller

Network Connection 26

NVMe over Fabrics Target Example Acceptor network poller handles connect events

Events

Connect Event

Events

Connection event registers new poller I/O request arrives over network I/O submitted to storage Storage device poller checks completions Response sent

All asynchronous work is driven by pollers

Reactor 0

Acceptor Poller

Network

Reactor 1 Connection Poller

Storage Poller

Storage

I/O 27

vhost-scsi Example

Events

VM guest adds task to shared-memory queue

Reactor 0

Task retrieved from queue and passed to SCSI I/O submitted to storage

Queue Poller

SCSI

Storage Poller

Storage poller completes I/O SCSI layer signals completion by sending an event Event completes I/O back to VM

Virtio Queue

Storage

VM Guest I/O 28

Software design follows from hardware capabilities

Building blocks to manage asynchronous I/O

Swappable environment abstraction

Notices and Disclaimers Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Learn more at intel.com, or from the OEM or retailer.

No computer system can be absolutely secure. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit http://www.intel.com/performance. Intel, the Intel logo, Xeon, and others are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. © 2017 Intel Corporation.