如何检查 BeautifulSoup 标签是否是某个标签？

Question

If I find a certain tag using beautifulsoup :如果我使用beautifulsoup找到某个标签：

styling = paragraphs.find_all('w:rpr')

I look at the next tag.我看下一个标签。 I only want to use that tag if it is a <w:t> tag.如果它是<w:t>标签，我只想使用该标签。 How do I check what type of tag the next tag is?如何检查下一个标签是什么类型的标签？

I tried element.find_next_sibling().startswith('<w:t') for the element but it says NoneType object is not callable .我为element.find_next_sibling().startswith('<w:t')尝试了element.find_next_sibling().startswith('<w:t')但它说NoneType object is not callable 。 I also tried element.find_next_sibling().find_all('<w:t'>) but it doesn't return anything.我也试过element.find_next_sibling().find_all('<w:t'>)但它没有返回任何东西。

for element in styling:
    next = element.find_next_sibling()
    if(#next is a <w:t> tag):
        ...

i am using beautifulsoup and would like to stick with it and not add eTree or other parser if possible with bs4.我正在使用beautifulsoup并希望坚持使用它，如果可能的话，不使用eTree添加eTree或其他解析器。

Answer 1

Using item.name you can see tag's name.使用item.name您可以看到标签的名称。

Problem is that between tags there are elements NavigableString which are also treated as sibling elements and they gives None .问题是标签之间有元素NavigableString也被视为兄弟元素，它们给出None 。

You would have to skip these elements or you could get all siblings and use for loop to find first <w:t> and exit loop with break您将不得不跳过这些元素，或者您可以获取所有兄弟元素并使用for循环查找第一个<w:t>并使用break退出循环

from bs4 import BeautifulSoup as BS

text = '''<div>
  <w:rpr></w:rpr>
  <w:t>A</w:t>
</div>'''

soup = BS(text, 'html.parser')

all_wrpr = soup.find_all('w:rpr')
for wrpr in all_wrpr:

    next_tag = wrpr.next_sibling
    print('name:', next_tag.name) # None

    next_tag = wrpr.next_sibling.next_sibling
    #next_tag = next_tag.next_sibling
    print('name:', next_tag.name) # w:t
    print('text:', next_tag.text) # A

#name: None
#name: w:t
#text: A

print('---')

all_siblings = wrpr.next_siblings
for item in all_siblings:
    if item.name == 'w:t':
       print('name:', item.name) # w:t
       print('text:', item.text) # A
       break # exit after first <w:t>

#name: w:t
#text: A

EDIT: If you test code with HTML formated little different编辑：如果你用 HTML 格式测试代码有点不同

text = '''<div>
  <w:rpr></w:rpr><w:t>A</w:t>
</div>'''

then there will be no NavigableString between tags and first method will fail but second method will still work.那么标签之间将没有NavigableString ，第一种方法将失败，但第二种方法仍然有效。

如何检查 BeautifulSoup 标签是否是某个标签？

问题描述

1 个解决方案

解决方案1
2 已采纳 2019-12-01 18:27:53

如何检查 BeautifulSoup 标签是否是某个标签？

问题描述

1 个解决方案

解决方案1 2 已采纳 2019-12-01 18:27:53

解决方案1
2 已采纳 2019-12-01 18:27:53