news_crawl.py 文件源码

python
阅读 23 收藏 0 点赞 0 评论 0

项目:atap 作者: foxbook 项目源码 文件源码
def crawl(url):
    domain = url.split("//www.")[-1].split("/")[0]
    html = requests.get(url).content
    soup = bs4.BeautifulSoup(html, "lxml")
    links = set(soup.findAll('a', href=True))
    for link in links:
        sub_url = link['href']
        page_name = link.string
        if domain in sub_url:
            try:
                page = requests.get(sub_url).content
                filename = slugify(page_name).lower() + '.html'
                with open(filename, 'wb') as f:
                    f.write(page)
            except:
                pass
评论列表
文章目录


问题


面经


文章

微信
公众号

扫码关注公众号