xtr.py 文件源码

python
阅读 21 收藏 0 点赞 0 评论 0

项目:python-search-engine 作者: ncouture 项目源码 文件源码
def get_clean_html(etree, text_only=False):
    _is_etree(etree)
    # enable filters to remove Javascript and CSS from HTML document
    cleaner = Cleaner()
    cleaner.javascript = True
    cleaner.style = True
    cleaner.html = True
    cleaner.page_structure = False
    cleaner.meta = False
    cleaner.safe_attrs_only = False
    cleaner.links = False

    html = cleaner.clean_html(etree)
    if text_only:
        return html.text_content()

    return lxml.html.tostring(html)
评论列表
文章目录


问题


面经


文章

微信
公众号

扫码关注公众号