简体   繁体   English

之后提取第二个元素<br/>

[英]Extract 2nd element after <br/>

I want to extract United States from the following: 我想从以下内容中提取United States

address = '1 McDonalds Dr<br/>OAK BROOK, IL 60523-1911<br/>United States'

I tried: 我试过了:

for br in address.find_all('br', limit=2):
    result = br.next_sibling
    print(result)

And it returned: 它返回:

OAK BROOK, IL 60523-1911
United States

How do you extract only United States from it? 您如何仅从中提取United States Thanks. 谢谢。

You are almost there! 你快到了!

for br in address.find_all('br', limit=2):
    result = br.next_sibling
    print(result)

This loops over all <br> tags and then executes the loop body for each. 这将遍历所有<br>标记,然后为每个标记执行循环主体。 Since your address contains two <br> tags, you will get two results. 由于您的address包含两个<br>标记,因此您将获得两个结果。 Since you only want to get the result from the second result, you will simply have to not loop over it but access the second element directly: 由于您只想从第二个结果中获取结果,因此您只需要不必在其上循环就可以直接访问第二个元素:

second_br = address.find_all('br', limit=2)[1]
result = second_br.next_sibling
print(result) # United States

You can use the .contents property of bs4. 您可以使用bs4的.contents属性。 If you check the contents of address you'll see that the text you want is in the last index. 如果检查addresscontents ,您会看到所需的文本在最后一个索引中。 You can access it by using contents[-1] . 您可以使用contents[-1]

>>> address.contents
["1 McDonalds Dr", <br/>, 'OAK BROOK, IL 60523-1911', <br/>, "United States"]
>>> result = address.contents[-1]
>>> result
"United States"

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM