rss_crawler.py 文件源码

python
阅读 24 收藏 0 点赞 0 评论 0

项目:news-please 作者: fhamborg 项目源码 文件源码
def supports_site(url):
        """
        Rss Crawler are supported if by every site containing an rss feed.

        Determines if this crawler works on the given url.

        :param str url: The url to test
        :return bool: Determines wether this crawler work on the given url
        """

        # Follow redirects
        opener = urllib2.build_opener(urllib2.HTTPRedirectHandler)
        redirect = opener.open(url).url
        response = urllib2.urlopen(redirect).read()

        # Check if a standard rss feed exists
        return re.search(
            r'(<link[^>]*href[^>]*type ?= ?"application\/rss\+xml"|' +
            r'<link[^>]*type ?= ?"application\/rss\+xml"[^>]*href)',
            response.decode('utf-8')) is not None
评论列表
文章目录


问题


面经


文章

微信
公众号

扫码关注公众号