[英]Get text between two different html tags python beautifulsoup
I was wondering if it were possible to get tags between two completely different texts via the beautifulsoup package in python.我想知道是否有可能通过 python 中的 beautifulsoup 包在两个完全不同的文本之间获取标签。 I have tried this out:我已经试过了:
g = soup.find_all(["dtposted"])
for tag in g:
print(tag)
<dtposted>2020<trnamt>10<fitid>202010<name>RESTAURANT</name></fitid></trnamt></dtposted>
I want to be able to separately get the text between dtposted, trnamt, fitid and name.我希望能够分别获取 dtposted、trnamt、fitid 和 name 之间的文本。 When I look for the next sibling, it returns None, and if I look for a specific tag, it doesn't give me the text between the two tags, but the entire string:当我寻找下一个兄弟时,它返回 None,如果我寻找一个特定的标签,它不会给我两个标签之间的文本,而是整个字符串:
for tag in g:
print(tag.find_all("tnramt")
<trnamt>10<fitid>202010<name>RESTAURANT</name></fitid></trnamt></dtposted>
If there is a way to get the 2020, 10, 202010, RESTAURANT all separately, that would be great.如果有办法分别获得 2020、10、202010 和 RESTAURANT,那就太好了。
See below (using XML parsing)见下文(使用 XML 解析)
import xml.etree.ElementTree as ET
xml = '''
<dtposted>
2020
<trnamt>
10
<fitid>
202010
<name>RESTAURANT</name>
</fitid>
</trnamt>
</dtposted>'''
root = ET.fromstring(xml)
print(root.text.strip())
print(root.find('.//trnamt').text.strip())
print(root.find('.//fitid').text.strip())
print(root.find('.//name').text.strip())
output输出
2020
10
202010
RESTAURANT
This will give you a list to work with:这将为您提供一个列表:
import re
variable = "<trnamt>10<fitid>202010<name>RESTAURANT</name></fitid></trnamt></dtposted>"
text = []
def tostring(html):
removedhtml = re.compile('<.*?>')
items = re.sub(removedhtml, ',', html)
items = items.split(",")
for item in items:
if item.strip():
text.append(item)
return text
print ((tostring(variable)))
output输出
['10', '202010', 'RESTAURANT']
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.