lxml中的解析功能出错

发布于 2021-01-29 17:14:47

我已经在Windows平台上安装了lxml2.2.2(使用python版本2.6.5的im)。我尝试了以下简单命令:

from lxml.html import parse 
p= parse(‘http://www.google.com’).getroot()

但我收到以下错误:

Traceback (most recent call last):
File “”, line 1, in p=parse(‘http://www.google.com’).getroot()
File “C:\Python26\lib\site-packages\lxml-2.2.2-py2.6-win32.egg\lxml\html_init_.py”, line 661, in parse return etree.parse(filenameorurl, parser, baseurl=baseurl, **kw) 
File “lxml.etree.pyx”, line 2698, in lxml.etree.parse (src/lxml/lxml.etree.c:49590) 
File “parser.pxi”, line 1491, in lxml.etree.parseDocument (src/lxml/lxml.etree.c:71205) File “parser.pxi”, line 1520, in lxml.etree.parseDocumentFromURL (src/lxml/lxml.etree.c:71488) 
File “parser.pxi”, line 1420, in lxml.etree.parseDocFromFile (src/lxml/lxml.etree.c:70583)
File “parser.pxi”, line 975, in lxml.etree.BaseParser.parseDocFrom
File (src/lxml/lxml.etree.c:67736)
File “parser.pxi”, line 539, in lxml.etree.ParserContext.handleParseResultDoc (src/lxml/lxml.etree.c:63820) 
File “parser.pxi”, line 625, in lxml.etree.handleParseResult (src/lxml/lxml.etree.c:64741)
File “parser.pxi”, line 563, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:64056)
IOError: Error reading file ‘http://www.google.com’: failed to load external entity “http://www.google.com”

我不知道下一步该怎么做,因为我是python的新手。请指导我解决此错误。提前致谢!!:)

关注者
0
被浏览
46
1 个回答
  • 面试哥
    面试哥 2021-01-29
    为面试而生,有面试问题,就找面试哥。

    lxml.html.parse 不获取URL。

    使用urllib2的方法如下:

    >>> from urllib2 import urlopen
    >>> from lxml.html import parse
    >>> page = urlopen('http://www.google.com')
    >>> p = parse(page)
    >>> p.getroot()
    <Element html at 1304050>
    

    更新
    史蒂文是正确的。lxml.etree.parse应该接受并加载网址。我错过了。我尝试删除此答案,但不允许这样做。

    我撤回了有关不获取URL的声明。



知识点
面圈网VIP题库

面圈网VIP题库全新上线,海量真题题库资源。 90大类考试,超10万份考试真题开放下载啦

去下载看看