proxy_pool.py 文件源码

python
阅读 22 收藏 0 点赞 0 评论 0

项目:icrawler 作者: hellock 项目源码 文件源码
def validate(self,
                 proxy_scanner,
                 expected_num=20,
                 queue_timeout=3,
                 val_timeout=5):
        """Target function of validation threads

        Args:
            proxy_scanner: A ProxyScanner object.
            expected_num: Max number of valid proxies to be scanned.
            queue_timeout: Timeout for getting a proxy from the queue.
            val_timeout: An integer passed to `is_valid` as argument `timeout`.
        """
        while self.proxy_num() < expected_num:
            try:
                candidate_proxy = proxy_scanner.proxy_queue.get(
                    timeout=queue_timeout)
            except queue.Empty:
                if proxy_scanner.is_scanning():
                    continue
                else:
                    break
            addr = candidate_proxy['addr']
            protocol = candidate_proxy['protocol']
            ret = self.is_valid(addr, protocol, val_timeout)
            if self.proxy_num() >= expected_num:
                self.logger.info('Enough valid proxies, thread {} exit.'
                                 .format(threading.current_thread().name))
                break
            if ret['valid']:
                self.add_proxy(Proxy(addr, protocol))
                self.logger.info('{} ok, {:.2f}s'.format(addr, ret[
                    'response_time']))
            else:
                self.logger.info('{} invalid, {}'.format(addr, ret['msg']))
评论列表
文章目录


问题


面经


文章

微信
公众号

扫码关注公众号