_htmlparser.py 文件源码

python
阅读 26 收藏 0 点赞 0 评论 0

项目:ServerlessCrawler-VancouverRealState 作者: MarcelloLins 项目源码 文件源码
def prepare_markup(self, markup, user_specified_encoding=None,
                       document_declared_encoding=None, exclude_encodings=None):
        """
        :return: A 4-tuple (markup, original encoding, encoding
        declared within markup, whether any characters had to be
        replaced with REPLACEMENT CHARACTER).
        """
        if isinstance(markup, unicode):
            yield (markup, None, None, False)
            return

        try_encodings = [user_specified_encoding, document_declared_encoding]
        dammit = UnicodeDammit(markup, try_encodings, is_html=True,
                               exclude_encodings=exclude_encodings)
        yield (dammit.markup, dammit.original_encoding,
               dammit.declared_html_encoding,
               dammit.contains_replacement_characters)
评论列表
文章目录


问题


面经


文章

微信
公众号

扫码关注公众号