简体   繁体   English

在两个不同的 html 标签之间获取文本 python beautifulsoup

[英]Get text between two different html tags python beautifulsoup

I was wondering if it were possible to get tags between two completely different texts via the beautifulsoup package in python.我想知道是否有可能通过 python 中的 beautifulsoup 包在两个完全不同的文本之间获取标签。 I have tried this out:我已经试过了:

g = soup.find_all(["dtposted"])
for tag in g:
    print(tag)

<dtposted>2020<trnamt>10<fitid>202010<name>RESTAURANT</name></fitid></trnamt></dtposted>

I want to be able to separately get the text between dtposted, trnamt, fitid and name.我希望能够分别获取 dtposted、trnamt、fitid 和 name 之间的文本。 When I look for the next sibling, it returns None, and if I look for a specific tag, it doesn't give me the text between the two tags, but the entire string:当我寻找下一个兄弟时,它返回 None,如果我寻找一个特定的标签,它不会给我两个标签之间的文本,而是整个字符串:

for tag in g:
    print(tag.find_all("tnramt")

<trnamt>10<fitid>202010<name>RESTAURANT</name></fitid></trnamt></dtposted>

If there is a way to get the 2020, 10, 202010, RESTAURANT all separately, that would be great.如果有办法分别获得 2020、10、202010 和 RESTAURANT,那就太好了。

See below (using XML parsing)见下文(使用 XML 解析)

import xml.etree.ElementTree as ET

xml = '''
<dtposted>
   2020
   <trnamt>
      10
      <fitid>
         202010
         <name>RESTAURANT</name>
      </fitid>
   </trnamt>
</dtposted>'''

root = ET.fromstring(xml)
print(root.text.strip())
print(root.find('.//trnamt').text.strip())
print(root.find('.//fitid').text.strip())
print(root.find('.//name').text.strip())

output输出

2020
10
202010
RESTAURANT

This will give you a list to work with:这将为您提供一个列表:

import re

variable = "<trnamt>10<fitid>202010<name>RESTAURANT</name></fitid></trnamt></dtposted>"
text = []
def tostring(html):
    removedhtml = re.compile('<.*?>')
    items = re.sub(removedhtml, ',', html)
    items = items.split(",")
    for item in items: 
        if item.strip(): 
            text.append(item)
    return text

print ((tostring(variable)))

output输出

['10', '202010', 'RESTAURANT']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM