限制/限制GRequests中HTTP请求的速率

发布于 2021-01-29 15:08:48

我正在用GRequests和lxml在Python
2.7.3中编写一个小脚本,这将允许我从各个网站收集一些可收集的卡价格并进行比较。问题是网站之一限制了请求的数量,如果我超过了它,则会发回HTTP错误429。

有没有一种方法可以限制GRequestes中的请求数量,以使我不超过我指定的每秒请求数量?另外-如果发生HTTP
429,如何让GRequestes在一段时间后重试?

附带说明-它们的极限太低了。大约每15秒8个请求。我多次用浏览器破坏它,只是刷新页面以等待价格变动。

关注者
0
被浏览
82
1 个回答
  • 面试哥
    面试哥 2021-01-29
    为面试而生,有面试问题,就找面试哥。

    因为我不得不自己解决这个问题,所以要回答我自己的问题,并且关于此问题的信息似乎很少。

    这个想法如下。与GRequests一起使用的每个请求对象在创建时都可以将会话对象作为参数。另一方面,会话对象可以安装在发出请求时使用的HTTP适配器。通过创建我们自己的适配器,我们可以拦截请求并对请求进行速率限制,从而找到最适合我们的应用程序的方式。就我而言,我得到了下面的代码。

    用于限制的对象:

    DEFAULT_BURST_WINDOW = datetime.timedelta(seconds=5)
    DEFAULT_WAIT_WINDOW = datetime.timedelta(seconds=15)
    
    
    class BurstThrottle(object):
        max_hits = None
        hits = None
        burst_window = None
        total_window = None
        timestamp = None
    
        def __init__(self, max_hits, burst_window, wait_window):
            self.max_hits = max_hits
            self.hits = 0
            self.burst_window = burst_window
            self.total_window = burst_window + wait_window
            self.timestamp = datetime.datetime.min
    
        def throttle(self):
            now = datetime.datetime.utcnow()
            if now < self.timestamp + self.total_window:
                if (now < self.timestamp + self.burst_window) and (self.hits < self.max_hits):
                    self.hits += 1
                    return datetime.timedelta(0)
                else:
                    return self.timestamp + self.total_window - now
            else:
                self.timestamp = now
                self.hits = 1
                return datetime.timedelta(0)
    

    HTTP适配器:

    class MyHttpAdapter(requests.adapters.HTTPAdapter):
        throttle = None
    
        def __init__(self, pool_connections=requests.adapters.DEFAULT_POOLSIZE,
                     pool_maxsize=requests.adapters.DEFAULT_POOLSIZE, max_retries=requests.adapters.DEFAULT_RETRIES,
                     pool_block=requests.adapters.DEFAULT_POOLBLOCK, burst_window=DEFAULT_BURST_WINDOW,
                     wait_window=DEFAULT_WAIT_WINDOW):
            self.throttle = BurstThrottle(pool_maxsize, burst_window, wait_window)
            super(MyHttpAdapter, self).__init__(pool_connections=pool_connections, pool_maxsize=pool_maxsize,
                                                max_retries=max_retries, pool_block=pool_block)
    
        def send(self, request, stream=False, timeout=None, verify=True, cert=None, proxies=None):
            request_successful = False
            response = None
            while not request_successful:
                wait_time = self.throttle.throttle()
                while wait_time > datetime.timedelta(0):
                    gevent.sleep(wait_time.total_seconds(), ref=True)
                    wait_time = self.throttle.throttle()
    
                response = super(MyHttpAdapter, self).send(request, stream=stream, timeout=timeout,
                                                           verify=verify, cert=cert, proxies=proxies)
    
                if response.status_code != 429:
                    request_successful = True
    
            return response
    

    设定:

    requests_adapter = adapter.MyHttpAdapter(
        pool_connections=__CONCURRENT_LIMIT__,
        pool_maxsize=__CONCURRENT_LIMIT__,
        max_retries=0,
        pool_block=False,
        burst_window=datetime.timedelta(seconds=5),
        wait_window=datetime.timedelta(seconds=20))
    
    requests_session = requests.session()
    requests_session.mount('http://', requests_adapter)
    requests_session.mount('https://', requests_adapter)
    
    unsent_requests = (grequests.get(url,
                                     hooks={'response': handle_response},
                                     session=requests_session) for url in urls)
    grequests.map(unsent_requests, size=__CONCURRENT_LIMIT__)
    


知识点
面圈网VIP题库

面圈网VIP题库全新上线,海量真题题库资源。 90大类考试,超10万份考试真题开放下载啦

去下载看看