[英]beautifulsoup - Fetching text either side of a br tag
I have unfortunately become stuck with the following problem: 不幸的是,我陷入了以下问题:
<a href="someurl">
"TEXT ONE"
<br>
"TEXT TWO"
</a>
I need text one and text two separately. 我需要分别输入文本一和文本二。 I can only obtain them as a whole providing "TEXT ONE TEXT TWO"
by using "text = container.a.text"
, with the container being a parent of the a tags. 我只能通过使用"text = container.a.text"
来提供"TEXT ONE TEXT TWO"
作为一个整体来获得它们,容器是a标签的父级。 I have tried as many ways as I could find with no success. 我尝试了许多无法成功的方法。 I can't manage to use br
tag properly. 我无法正确使用br
标签。
Thank you for any help. 感谢您的任何帮助。
I would avoid relying on the presence of the br
element and would instead locate all the text nodes inside the a
: 我将避免依赖于br
元素的存在,而是将所有文本节点放在a
:
In [1]: from bs4 import BeautifulSoup
In [2]: html = """ <a href="someurl">
...: "TEXT ONE"
...: <br>
...: "TEXT TWO"
...: </a>"""
In [3]: soup = BeautifulSoup(html, "html.parser")
In [4]: [item.strip() for item in soup.a(text=True)]
Out[4]: ['"TEXT ONE"', '"TEXT TWO"']
Note that a(text=True)
is a short version of a.find_all(text=True)
. 请注意, a(text=True)
是a.find_all(text=True)
的简短版本。
You can, of course, unpack it into separate variables if needed : 当然,您可以根据需要将其解压缩为单独的变量 :
In [5]: text_one, text_two = [item.strip() for item in soup.a(text=True)]
In [6]: text_one
Out[6]: '"TEXT ONE"'
In [7]: text_two
Out[7]: '"TEXT TWO"'
You could use .previousSibiling
and .nextSibling
attributes after finding the br
tag: 找到br
标签后,可以使用.previousSibiling
和.nextSibling
属性:
>>> container.a.find("br").previousSibling
' \n"TEXT ONE"\n '
>>> container.a.find("br").nextSibling
'\n "TEXT TWO"\n '
You can do the same in several ways. 您可以通过几种方式进行相同操作。 Here is another way: 这是另一种方式:
from bs4 import BeautifulSoup
content='''
<a href="someurl">
"TEXT ONE"
<br>
"TEXT TWO"
</a>
'''
soup = BeautifulSoup(content,'lxml')
for items in soup.select('a'):
elem = [' '.join(item.split()) for item in items.strings]
print(elem)
Output: 输出:
['"TEXT ONE"', '"TEXT TWO"']
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.