[英]Targetting a text element with no tag in Beautiful Soup
I'm trying to target a date in bs4
that has no tag associated with it.我试图在bs4
中定位一个没有与之关联的标签的日期。
The structure of the HTML is: HTML的结构是:
<div class="story-postdate">
<div class="label-inline">Published</div>
Feb 6, 2020, 4:40 pm SGT
</div>
I've tried .findAll(attrs={'class' : 'story-postdate'})
and then tried to use the sibling
selector.我试过.findAll(attrs={'class' : 'story-postdate'})
然后尝试使用sibling
选择器。 But no joy.但是没有快乐。
I've also tried to target it with .text
at the end of the CSS selector.我还尝试在 CSS 选择器的末尾使用.text
来定位它。 But that still returns a none object type.但这仍然返回一个 none 对象类型。
from bs4 import BeautifulSoup as soup
html = '''<div class="story-postdate">
<div class="label-inline">Published</div>
Feb 6, 2020, 4:40 pm SGT
</div>'''
b = soup(html, 'html.parser')
div_tag = b.findAll(attrs={'class' : 'story-postdate'})
for div in div_tag:
print(" ".join(div.text.split()[1:]))
Output:输出:
Feb 6, 2020, 4:40 pm SGT
Or :或者 :
print(" ".join(div.text.split()[1:4]))
Output:输出:
Feb 6, 2020,
Try this尝试这个
from bs4 import BeautifulSoup
from simplified_scrapy import SimplifiedDoc
html = """<div class="story-postdate">
<div class="label-inline">Published</div>
Feb 6, 2020, 4:40 pm SGT
</div>"""
# using BeautifulSoup
soup = BeautifulSoup(html,features='lxml')
print (soup.find('div',attrs={'class' : 'story-postdate'}).div.next_sibling)
# using SimplifiedDoc
doc = SimplifiedDoc(html)
print (doc.select('div.story-postdate>div').nextText())
Here are more examples of SimplifiedDoc Library: https://github.com/yiyedata/simplified-scrapy-demo/tree/master/doc_examples以下是 SimplifiedDoc 库的更多示例: https : //github.com/yiyedata/simplified-scrapy-demo/tree/master/doc_examples
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.