[英]Extract 2nd element after <br/>
I want to extract United States
from the following: 我想从以下内容中提取
United States
:
address = '1 McDonalds Dr<br/>OAK BROOK, IL 60523-1911<br/>United States'
I tried: 我试过了:
for br in address.find_all('br', limit=2):
result = br.next_sibling
print(result)
And it returned: 它返回:
OAK BROOK, IL 60523-1911
United States
How do you extract only United States
from it? 您如何仅从中提取
United States
? Thanks. 谢谢。
You are almost there! 你快到了!
for br in address.find_all('br', limit=2):
result = br.next_sibling
print(result)
This loops over all <br>
tags and then executes the loop body for each. 这将遍历所有
<br>
标记,然后为每个标记执行循环主体。 Since your address
contains two <br>
tags, you will get two results. 由于您的
address
包含两个<br>
标记,因此您将获得两个结果。 Since you only want to get the result from the second result, you will simply have to not loop over it but access the second element directly: 由于您只想从第二个结果中获取结果,因此您只需要不必在其上循环就可以直接访问第二个元素:
second_br = address.find_all('br', limit=2)[1]
result = second_br.next_sibling
print(result) # United States
You can use the .contents
property of bs4. 您可以使用bs4的
.contents
属性。 If you check the contents
of address
you'll see that the text you want is in the last index. 如果检查
address
的contents
,您会看到所需的文本在最后一个索引中。 You can access it by using contents[-1]
. 您可以使用
contents[-1]
。
>>> address.contents
["1 McDonalds Dr", <br/>, 'OAK BROOK, IL 60523-1911', <br/>, "United States"]
>>> result = address.contents[-1]
>>> result
"United States"
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.