middleware.py 文件源码

python
阅读 21 收藏 0 点赞 0 评论 0

项目:ahmia-crawler 作者: ahmia 项目源码 文件源码
def process_request(self, request, spider): # pylint:disable=unused-argument
        """Process incoming request."""
        parsed_uri = urlparse(request.url)
        domain = '{uri.scheme}://{uri.netloc}/'.format(uri=parsed_uri)
        domain = domain.replace("http://", "").replace("https://", "") \
                                              .replace("/", "")
        banned_domains = settings.get('BANNED_DOMAINS')
        if hashlib.md5(domain).hexdigest() in banned_domains:
            # Do not execute this request
            request.meta['proxy'] = ""
            msg = "Ignoring request {}, This domain is banned." \
                  .format(request.url)
            logging.info(msg)
            raise IgnoreRequest()
评论列表
文章目录


问题


面经


文章

微信
公众号

扫码关注公众号