使用美丽汤按类名获取内容

发布于 2021-01-29 15:13:30

使用Beautiful Soup模块,如何获取div类名称为的标签的数据feeditemcontent cxfeeditemcontent?是吗:

soup.class['feeditemcontent cxfeeditemcontent']

要么:

soup.find_all('class')

这是HTML来源:

<div class="feeditemcontent cxfeeditemcontent">
    <div class="feeditembodyandfooter">
         <div class="feeditembody">
         <span>The actual data is some where here</span>
         </div>
     </div>
 </div>

这是Python代码:

 from BeautifulSoup import BeautifulSoup
 html_doc = open('home.jsp.html', 'r')

 soup = BeautifulSoup(html_doc)
 class="feeditemcontent cxfeeditemcontent"
关注者
0
被浏览
52
1 个回答
  • 面试哥
    面试哥 2021-01-29
    为面试而生,有面试问题,就找面试哥。

    试试这个,也许对于这个简单的事情来说太多了,但是它起作用了:

    def match_class(target):
        target = target.split()
        def do_match(tag):
            try:
                classes = dict(tag.attrs)["class"]
            except KeyError:
                classes = ""
            classes = classes.split()
            return all(c in classes for c in target)
        return do_match
    
    html = """<div class="feeditemcontent cxfeeditemcontent">
    <div class="feeditembodyandfooter">
    <div class="feeditembody">
    <span>The actual data is some where here</span>
    </div>
    </div>
    </div>"""
    
    from BeautifulSoup import BeautifulSoup
    
    soup = BeautifulSoup(html)
    
    matches = soup.findAll(match_class("feeditemcontent cxfeeditemcontent"))
    for m in matches:
        print m
        print "-"*10
    
    matches = soup.findAll(match_class("feeditembody"))
    for m in matches:
        print m
        print "-"*10
    


知识点
面圈网VIP题库

面圈网VIP题库全新上线,海量真题题库资源。 90大类考试,超10万份考试真题开放下载啦

去下载看看