如何跳过<span>美丽的汤</span>

Question

Here is output of my code 这是我的代码的输出

<h1 class="it-ttl" id="itemTitle" itemprop="name"><span class="g-hdn">Details about   </span>item name goes here</h1>

I want to get item name only, without "details about" part. 我只想获取商品名称，而没有“详细信息”部分。

My Python code the selects the certain div id is 我的Python代码选择特定的div ID是

for content in soup.select('#itemTitle'):
    print(content.text)

Answer 1

You can use decompose() clear() or extract() . 您可以使用decompose（） clear（）或extract（）。 According to the documentation: 根据文档：

Tag.decompose() removes a tag from the tree, then completely destroys it and its contents Tag.decompose（）从树中删除标签，然后完全销毁它及其内容

Tag.clear() removes the contents of a tag Tag.clear（）删除标签的内容

PageElement.extract() removes a tag or string from the tree. PageElement.extract（）从树中删除标签或字符串。 It returns the tag or string that was extracted 它返回提取的标签或字符串

from bs4 import BeautifulSoup
html = '''<h1 class="it-ttl" id="itemTitle" itemprop="name"><span class="g-hdn">Details about   </span>item name goes here</h1>'''

soup = BeautifulSoup(html, 'lxml')
for content in soup.select('#itemTitle'):
    content.span.decompose()
    print(content.text)

Output: 输出：

  item name goes here

Answer 2

My answer is inspired by this accepted answer . 我的答案是受到这个公认答案的启发。

Code: 码：

from bs4 import BeautifulSoup, NavigableString

data = '''
<h1 class="it-ttl" id="itemTitle" itemprop="name"><span class="g-hdn">Details about   </span>item name goes here</h1>
'''

soup = BeautifulSoup(data, 'html.parser')
inner_text = [element for element in soup.h1 if isinstance(element, NavigableString)]
print(inner_text)

Output: 输出：

['item name goes here']

Answer 3

How about this: 这个怎么样：

from bs4 import BeautifulSoup
html= """<h1 class="it-ttl" id="itemTitle" itemprop="name"><span class="g-hdn">Details about   </span>item name goes here</h1>"""

soup = BeautifulSoup(html, "lxml")

text = soup.find('h1', attrs={"id":"itemTitle"}).text
span = soup.find('span', attrs={"class":"g-hdn"}).text

final_text = text[len(span):]

print(final_text)

This results in: 结果是：

item name goes here

Answer 4

Try if this works 尝试是否可行

from bs4 import BeautifulSoup 
soup = BeautifulSoup("""<h1 class="it-ttl" id="itemTitle" itemprop="name">
<span class="g-hdn">Details about  </span>
item name goes here</h1>""")  
print(soup.find('h1', {'class': 'it-ttl'}).contents[-1].strip())

如何跳过<span>美丽的汤</span>

问题描述

4 个解决方案

解决方案1
3 已采纳 2018-01-24 03:54:10

解决方案2
2 2018-01-24 03:48:28

解决方案3
1 2018-01-24 03:51:47

解决方案4
0 2018-01-24 03:54:48

如何跳过<span>美丽的汤</span>

问题描述

4 个解决方案

解决方案1 3 已采纳 2018-01-24 03:54:10

解决方案2 2 2018-01-24 03:48:28

解决方案3 1 2018-01-24 03:51:47

解决方案4 0 2018-01-24 03:54:48

解决方案1
3 已采纳 2018-01-24 03:54:10

解决方案2
2 2018-01-24 03:48:28

解决方案3
1 2018-01-24 03:51:47

解决方案4
0 2018-01-24 03:54:48