Python:如何使用字节范围下载文件?
我想以多线程模式下载文件,这里有以下代码:
#!/usr/bin/env python
import httplib
def main():
url_opt = '/film/0d46e21795209bc18e9530133226cfc3/7f_Naruto.Uragannie.Hroniki.001.seriya.a1.20.06.13.mp4'
headers = {}
headers['Accept-Language'] = 'en-GB,en-US,en'
headers['Accept-Encoding'] = 'gzip,deflate,sdch'
headers['Accept-Charset'] = 'max-age=0'
headers['Cache-Control'] = 'ISO-8859-1,utf-8,*'
headers['Cache-Control'] = 'max-age=0'
headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 5.1)'
headers['Connection'] = 'keep-alive'
headers['Accept'] = 'text/html,application/xhtml+xml,application/xml,*/*'
headers['Range'] = ''
conn = httplib.HTTPConnection('data09-cdn.datalock.ru:80')
conn.request("GET", url_opt, '', headers)
print "Request sent"
resp = conn.getresponse()
print resp.status
print resp.reason
print resp.getheaders()
file_for_wirte = open('cartoon.mp4', 'w')
file_for_wirte.write(resp.read())
print resp.read()
conn.close()
if __name__ == "__main__":
main()
输出如下:
Request sent
200
OK
[('content-length', '62515220'), ('accept-ranges', 'bytes'), ('server', 'nginx/1.2.7'), ('last-modified', 'Thu, 20 Jun 2013 12:10:43 GMT'), ('connection', 'keep-alive'), ('date', 'Fri, 14 Feb 2014 07:53:30 GMT'), ('content-type', 'video/mp4')]
这段代码可以正常工作,但是我不理解文档中如何使用范围下载文件。如果看到响应的输出,则哪个服务器提供:
('content-length', '62515220'), ('accept-ranges', 'bytes')
它支持以“字节”为单位的范围,其中内容大小为62515220
但是,在此请求中下载了整个文件。但是,我首先要获取服务器信息,例如可以使用http范围查询和文件内容大小而不下载而支持该文件吗?以及如何创建具有范围(即0〜25000)的http查询?
-
通过
Range
标头bytes=start_offset- end_offset
作为范围说明符。例如,以下代码检索前300个字节。(
0-299
):>>> import httplib >>> conn = httplib.HTTPConnection('localhost') >>> conn.request("GET", '/', headers={'Range': 'bytes=0-299'}) # <---- >>> resp = conn.getresponse() >>> resp.status 206 >>> resp.status == httplib.PARTIAL_CONTENT True >>> resp.getheader('content-range') 'bytes 0-299/612' >>> content = resp.read() >>> len(content) 300
注: 这两个
start_offset
,end_offset
都包括在内。更新
如果服务器不理解
Range
标题,它将以状态码200(httplib.OK
)而不是206(httplib.PARTIAL_CONTENT
)进行响应,并将发送整个内容。要确保服务器响应部分内容,请检查状态码。>>> resp.status == httplib.PARTIAL_CONTENT True