在两个不同的 html 标签之间获取文本 python beautifulsoup

Question

I was wondering if it were possible to get tags between two completely different texts via the beautifulsoup package in python.我想知道是否有可能通过 python 中的 beautifulsoup 包在两个完全不同的文本之间获取标签。 I have tried this out:我已经试过了：

g = soup.find_all(["dtposted"])
for tag in g:
    print(tag)

<dtposted>2020<trnamt>10<fitid>202010<name>RESTAURANT</name></fitid></trnamt></dtposted>

I want to be able to separately get the text between dtposted, trnamt, fitid and name.我希望能够分别获取 dtposted、trnamt、fitid 和 name 之间的文本。 When I look for the next sibling, it returns None, and if I look for a specific tag, it doesn't give me the text between the two tags, but the entire string:当我寻找下一个兄弟时，它返回 None，如果我寻找一个特定的标签，它不会给我两个标签之间的文本，而是整个字符串：

for tag in g:
    print(tag.find_all("tnramt")

<trnamt>10<fitid>202010<name>RESTAURANT</name></fitid></trnamt></dtposted>

If there is a way to get the 2020, 10, 202010, RESTAURANT all separately, that would be great.如果有办法分别获得 2020、10、202010 和 RESTAURANT，那就太好了。

Answer 1

See below (using XML parsing)见下文（使用 XML 解析）

import xml.etree.ElementTree as ET

xml = '''
<dtposted>
   2020
   <trnamt>
      10
      <fitid>
         202010
         <name>RESTAURANT</name>
      </fitid>
   </trnamt>
</dtposted>'''

root = ET.fromstring(xml)
print(root.text.strip())
print(root.find('.//trnamt').text.strip())
print(root.find('.//fitid').text.strip())
print(root.find('.//name').text.strip())

output输出

2020
10
202010
RESTAURANT

Answer 2

This will give you a list to work with:这将为您提供一个列表：

import re

variable = "<trnamt>10<fitid>202010<name>RESTAURANT</name></fitid></trnamt></dtposted>"
text = []
def tostring(html):
    removedhtml = re.compile('<.*?>')
    items = re.sub(removedhtml, ',', html)
    items = items.split(",")
    for item in items: 
        if item.strip(): 
            text.append(item)
    return text

print ((tostring(variable)))

output输出

['10', '202010', 'RESTAURANT']

在两个不同的 html 标签之间获取文本 python beautifulsoup

问题描述

2 个解决方案

解决方案1
1 已采纳 2020-11-08 20:04:45

解决方案2
0 2020-11-08 20:35:34

在两个不同的 html 标签之间获取文本 python beautifulsoup

问题描述

2 个解决方案

解决方案1 1 已采纳 2020-11-08 20:04:45

解决方案2 0 2020-11-08 20:35:34

解决方案1
1 已采纳 2020-11-08 20:04:45

解决方案2
0 2020-11-08 20:35:34