rss_crawler.py 文件源码-python代码片段

rss_crawler.py 文件源码

python

阅读 25 收藏 0 点赞 0 评论 0

项目：news-please 作者: fhamborg 项目源码文件源码

def supports_site(url):
        """
        Rss Crawler are supported if by every site containing an rss feed.

        Determines if this crawler works on the given url.

        :param str url: The url to test
        :return bool: Determines wether this crawler work on the given url
        """

        # Follow redirects
        opener = urllib2.build_opener(urllib2.HTTPRedirectHandler)
        redirect = opener.open(url).url
        response = urllib2.urlopen(redirect).read()

        # Check if a standard rss feed exists
        return re.search(
            r'(<link[^>]*href[^>]*type ?= ?"application\/rss\+xml"|' +
            r'<link[^>]*type ?= ?"application\/rss\+xml"[^>]*href)',
            response.decode('utf-8')) is not None