def producer(start_url):
"""Recursively crawl starting from *start_url*. Returns a set of
urls that were found."""
pool = eventlet.GreenPool()
seen = set()
q = eventlet.Queue()
q.put(start_url)
# keep looping if there are new urls, or workers that may produce more urls
while True:
while not q.empty():
url = q.get()
# limit requests to eventlet.net so we don't crash all over the internet
if url not in seen and 'eventlet.net' in url:
seen.add(url)
pool.spawn_n(fetch, url, q)
pool.waitall()
if q.empty():
break
return seen
producer_consumer.py 文件源码
python
阅读 22
收藏 0
点赞 0
评论 0
评论列表
文章目录