3 使用Python加速文件传输和文件复制 Giampaolo Rodola
2020-03-01 339浏览
- 1.Efficient I/O with zero-copy & psutil 利用零拷贝和 psutil 来高效的进行 I/O 操作 Giampaolo Rodola Pycon China 2019, Shanghai
- 2.Who am I? ● ● ● ● Giampaolo Rodola Python core-developer since 2010 Author of psutil library Author of pyftpdlib (Python FTP server) library ●https://github.com/giampaolo
- 3.Agenda ● Part 1: ○ basic UNIX concepts ○ basic socket operations ○ send files efficiently ○ copy files efficiently ● Part 2: ○ psutil ● 第1部分 ○基础的 Unix 概念 ○基础的 Socket 操作 ○高效的传输文件 ○高效的复制文件 ● 第2部分 ○ psutil
- 4.UNIX concepts (oversimplified) [简单聊聊 Unix 的相关概念]
- 5.System call / 系统调用 ● A way for a user-space application to interact with the kernel ● (mostly) exposed in the os module ● 用户空间中的应用程序用于 与内核交互的手段 ● 在 Python 中相关的 API 由 os 模块提供
- 6.System calls / 系统调用 I/O ● open() ● read() ● write() Processes / 进程 ● fork() ● kill() ● wait() Filesystem / 文件系统 ● chmod() ● mkdir() ● getcwd() Communication / 通信 ● pipe() ● splice() ● mmap()
- 7.Kernel / 内核 application kernel hardware
- 8.User & kernel space / 用户空间 & 内核空间 application kernel hardware user space kernel space
- 9.User time Kernel time x = 0 while x != 10000000: x += 1 # generate random string of N length import os os.urandom(200000000) $ time python3 script.py real 0m0,752s user 0m0,752s sys 0m0,000s $ time python3 script.py real 0m1,123s user 0m0,012s sys 0m1,099s
- 10.File descriptors 文件描述符
- 11.File descriptors / 文件描述符 ● it's a reference to "something" (usually a file) ● it can be mixed with system calls ● 是对文件/套接字等资源的引用 ● 可以和系统调用连用
- 12.Print >>> import sys, os >>> sys.stdout.fileno() 1 >>> os.write(1, b'hello world') hello world
- 13.Disk >>> import os >>> fd = os.open('file', os.O_WRONLY os.O_CREAT) >>> os.write(fd, b'hello') 5 >>> os.close(fd) >>> >>> fd = os.open('file', os.O_RDONLY) >>> os.read(fd, 11) b'hello'
- 14.Terminal >>> # terminal size >>> import sys, struct, fcntl, termios >>> s = struct.pack('HHHH', 0, 0, 0, 0) >>> t = fcntl.ioctl(sys.stdout.fileno(), termios.TIOCGWINSZ, s) >>> struct.unpack('HHHH', t) (55, 105, 0, 0)
- 15.This is why “everything is a file in UNIX” 所以这就是所谓“Unix 下,一切皆文件”的由来
- 16.Summary ● ● ● ● ●syscall:a gateway to the kernelkernel:a gateway to the hardware syscalls cause a context switch context switches consume time syscalls and file descriptors can be mixed together ● ● ● ● ● 系统调用:与内核交互的途径 内核:与硬件交互的途径 系统调用将会触发上下文切换 上下文切换将会消耗时间 系统调用和文件描述符可以连用
- 17.Basic socket operations 基础的 socket 操作
- 18.Server from socket import socket, AF_INET, SOCK_STREAM sock = socket(AF_INET, SOCK_STREAM) # IPv4, TCP sock.bind(("", 8080)) # all interfaces, port 8080 sock.listen(5) # 监听队列 whileTrue:conn, addr = sock.accept() # accept 连接 # handle connection
- 19.Server:IPv4 + IPv6 (Python 3.8) from socket import create_server, AF_INET6 sock = create_server(("", 8080), family=AF_INET6, dualstack_ipv6=True) whileTrue:conn, addr = sock.accept() # handle connection/处理连接
- 20.Client from socket import socket, AF_INET, SOCK_STREAM sock = socket(AF_INET, SOCK_STREAM) sock.connect(("127.0.0.1", 8080)) sock.send(b"hello") sock.recv(8196)
- 21.Sending files 传输文件
- 22.sending a file from socket import create_server, AF_INET6 sock = create_server(("", 8080), family=AF_INET6, dualstack_ipv6=True) conn, addr = sock.accept() with open('somefile', 'rb') asfile:whileTrue:chunk = file.read(65536) if notchunk:break # EOF conn.sendall(chunk)
- 23.sending a file from socket import create_server, AF_INET6 sock = create_server(("", 8080), family=AF_INET6, dualstack_ipv6=True) conn, addr = sock.accept() with open('somefile', 'rb') asfile:whileTrue:chunk = file.read(65536) # 2 context switches if notchunk:break # EOF conn.sendall(chunk) # 2 context switches
- 24.sending a file from socket import create_server, AF_INET6 sock = create_server(("", 8080), family=AF_INET6, dualstack_ipv6=True) conn, addr = sock.accept() with open('somefile', 'rb') asfile:whileTrue:chunk = file.read(65536) # 1 memory copy if notchunk:break # EOF conn.sendall(chunk) # 1 memory copy
- 25.read() / send() system calls 2 context switches 4 memory copies 2
- 26.How can we avoid that? 怎么样去避免这些问题?
- 27.Zero-copy syscalls 支持零拷贝的系统调用 ● ● ● ● ● sendfile() copy_file_range() mmap() splice() / vmsplice() / tee() KTLS (kernel-space TLS)
- 28.sendfile() (zero-copy) import socket, os sock = socket.create_server(("", 8080)) whileTrue:'>True: