使用美丽汤按类名获取内容
发布于 2021-01-29 15:13:30
使用Beautiful Soup模块,如何获取div
类名称为的标签的数据feeditemcontent cxfeeditemcontent
?是吗:
soup.class['feeditemcontent cxfeeditemcontent']
要么:
soup.find_all('class')
这是HTML来源:
<div class="feeditemcontent cxfeeditemcontent">
<div class="feeditembodyandfooter">
<div class="feeditembody">
<span>The actual data is some where here</span>
</div>
</div>
</div>
这是Python代码:
from BeautifulSoup import BeautifulSoup
html_doc = open('home.jsp.html', 'r')
soup = BeautifulSoup(html_doc)
class="feeditemcontent cxfeeditemcontent"
关注者
0
被浏览
52
1 个回答
-
试试这个,也许对于这个简单的事情来说太多了,但是它起作用了:
def match_class(target): target = target.split() def do_match(tag): try: classes = dict(tag.attrs)["class"] except KeyError: classes = "" classes = classes.split() return all(c in classes for c in target) return do_match html = """<div class="feeditemcontent cxfeeditemcontent"> <div class="feeditembodyandfooter"> <div class="feeditembody"> <span>The actual data is some where here</span> </div> </div> </div>""" from BeautifulSoup import BeautifulSoup soup = BeautifulSoup(html) matches = soup.findAll(match_class("feeditemcontent cxfeeditemcontent")) for m in matches: print m print "-"*10 matches = soup.findAll(match_class("feeditembody")) for m in matches: print m print "-"*10