news_crawl.py 文件源码

python

阅读 23 收藏 0 点赞 0 评论 0

项目：atap 作者: foxbook 项目源码文件源码

def crawl(url):
    domain = url.split("//www.")[-1].split("/")[0]
    html = requests.get(url).content
    soup = bs4.BeautifulSoup(html, "lxml")
    links = set(soup.findAll('a', href=True))
    for link in links:
        sub_url = link['href']
        page_name = link.string
        if domain in sub_url:
            try:
                page = requests.get(sub_url).content
                filename = slugify(page_name).lower() + '.html'
                with open(filename, 'wb') as f:
                    f.write(page)
            except:
                pass

评论列表正在加载评论...

文章目录

提
问题

写
面经

写
文章

微信
公众号

扫码关注公众号