html_parser.py 文件源码

python
阅读 28 收藏 0 点赞 0 评论 0

项目:minerva 作者: linzhi 项目源码 文件源码
def get_content(cls, url=None, session=None):
        """
        @brief: ??url????????????
        """

        hyperlinks = set()
        soup_context = None

        # ???????????????
        html_context = cls.parse_page(url, session)
        if html_context:
            soup_context = BeautifulSoup.BeautifulSoup(html_context)
            if soup_context:
                soup_context = BeautifulSoup.BeautifulSoup(html_context)
                for each_link in soup_context.findAll('a'):
                    hyperlink = urlparse.urljoin(url, (each_link or {}).get('href'))
                    hyperlinks.add(hyperlink)

        return hyperlinks, soup_context
评论列表
文章目录


问题


面经


文章

微信
公众号

扫码关注公众号