在 Beautiful Soup 中定位一个没有标签的文本元素

Question

I'm trying to target a date in bs4 that has no tag associated with it.我试图在bs4中定位一个没有与之关联的标签的日期。

The structure of the HTML is: HTML的结构是：

<div class="story-postdate">
  <div class="label-inline">Published</div>
  Feb 6, 2020, 4:40 pm SGT
</div>

I've tried .findAll(attrs={'class' : 'story-postdate'}) and then tried to use the sibling selector.我试过.findAll(attrs={'class' : 'story-postdate'})然后尝试使用sibling选择器。 But no joy.但是没有快乐。

I've also tried to target it with .text at the end of the CSS selector.我还尝试在 CSS 选择器的末尾使用.text来定位它。 But that still returns a none object type.但这仍然返回一个 none 对象类型。

Answer 1

from bs4 import BeautifulSoup as soup

html = '''<div class="story-postdate">
  <div class="label-inline">Published</div>
  Feb 6, 2020, 4:40 pm SGT
</div>'''

b = soup(html, 'html.parser')
div_tag = b.findAll(attrs={'class' : 'story-postdate'})
for div in div_tag:
    print(" ".join(div.text.split()[1:]))

Output:输出：

Feb 6, 2020, 4:40 pm SGT

Or :或者：

print(" ".join(div.text.split()[1:4]))

Output:输出：

Feb 6, 2020,

Answer 2

Try this尝试这个

from bs4 import BeautifulSoup
from simplified_scrapy import SimplifiedDoc
html = """<div class="story-postdate">
  <div class="label-inline">Published</div>
  Feb 6, 2020, 4:40 pm SGT
</div>"""

# using BeautifulSoup
soup = BeautifulSoup(html,features='lxml')
print (soup.find('div',attrs={'class' : 'story-postdate'}).div.next_sibling)

# using SimplifiedDoc
doc  = SimplifiedDoc(html)
print (doc.select('div.story-postdate>div').nextText())

Here are more examples of SimplifiedDoc Library: https://github.com/yiyedata/simplified-scrapy-demo/tree/master/doc_examples以下是 SimplifiedDoc 库的更多示例： https : //github.com/yiyedata/simplified-scrapy-demo/tree/master/doc_examples

在 Beautiful Soup 中定位一个没有标签的文本元素

问题描述

2 个解决方案

解决方案1
0 2020-02-29 11:43:12

解决方案2
0 已采纳 2020-02-29 12:17:31

在 Beautiful Soup 中定位一个没有标签的文本元素

问题描述

2 个解决方案

解决方案1 0 2020-02-29 11:43:12

解决方案2 0 已采纳 2020-02-29 12:17:31

解决方案1
0 2020-02-29 11:43:12

解决方案2
0 已采纳 2020-02-29 12:17:31