简体   繁体   English

如何检查 BeautifulSoup 标签是否是某个标签?

[英]How to check if BeautifulSoup tag is a certain tag?

If I find a certain tag using beautifulsoup :如果我使用beautifulsoup找到某个标签:

styling = paragraphs.find_all('w:rpr')

I look at the next tag.我看下一个标签。 I only want to use that tag if it is a <w:t> tag.如果它是<w:t>标签,我只想使用该标签。 How do I check what type of tag the next tag is?如何检查下一个标签是什么类型的标签?

I tried element.find_next_sibling().startswith('<w:t') for the element but it says NoneType object is not callable .我为element.find_next_sibling().startswith('<w:t')尝试了element.find_next_sibling().startswith('<w:t')但它说NoneType object is not callable I also tried element.find_next_sibling().find_all('<w:t'>) but it doesn't return anything.我也试过element.find_next_sibling().find_all('<w:t'>)但它没有返回任何东西。

for element in styling:
    next = element.find_next_sibling()
    if(#next is a <w:t> tag):
        ...

i am using beautifulsoup and would like to stick with it and not add eTree or other parser if possible with bs4.我正在使用beautifulsoup并希望坚持使用它,如果可能的话,不使用eTree添加eTree或其他解析器。

Using item.name you can see tag's name.使用item.name您可以看到标签的名称。

Problem is that between tags there are elements NavigableString which are also treated as sibling elements and they gives None .问题是标签之间有元素NavigableString也被视为兄弟元素,它们给出None

You would have to skip these elements or you could get all siblings and use for loop to find first <w:t> and exit loop with break您将不得不跳过这些元素,或者您可以获取所有兄弟元素并使用for循环查找第一个<w:t>并使用break退出循环

from bs4 import BeautifulSoup as BS

text = '''<div>
  <w:rpr></w:rpr>
  <w:t>A</w:t>
</div>'''

soup = BS(text, 'html.parser')

all_wrpr = soup.find_all('w:rpr')
for wrpr in all_wrpr:

    next_tag = wrpr.next_sibling
    print('name:', next_tag.name) # None

    next_tag = wrpr.next_sibling.next_sibling
    #next_tag = next_tag.next_sibling
    print('name:', next_tag.name) # w:t
    print('text:', next_tag.text) # A

#name: None
#name: w:t
#text: A

print('---')

all_siblings = wrpr.next_siblings
for item in all_siblings:
    if item.name == 'w:t':
       print('name:', item.name) # w:t
       print('text:', item.text) # A
       break # exit after first <w:t>

#name: w:t
#text: A    

EDIT: If you test code with HTML formated little different编辑:如果你用 HTML 格式测试代码有点不同

text = '''<div>
  <w:rpr></w:rpr><w:t>A</w:t>
</div>'''

then there will be no NavigableString between tags and first method will fail but second method will still work.那么标签之间将没有NavigableString ,第一种方法将失败,但第二种方法仍然有效。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM