之后提取第二个元素

Question

I want to extract United States from the following: 我想从以下内容中提取United States ：

address = '1 McDonalds Dr<br/>OAK BROOK, IL 60523-1911<br/>United States'

I tried: 我试过了：

for br in address.find_all('br', limit=2):
    result = br.next_sibling
    print(result)

And it returned: 它返回：

OAK BROOK, IL 60523-1911
United States

How do you extract only United States from it? 您如何仅从中提取United States ？ Thanks. 谢谢。

Answer 1

You are almost there! 你快到了！

for br in address.find_all('br', limit=2):
    result = br.next_sibling
    print(result)

This loops over all   tags and then executes the loop body for each. 这将遍历所有 标记，然后为每个标记执行循环主体。 Since your address contains two   tags, you will get two results. 由于您的address包含两个 标记，因此您将获得两个结果。 Since you only want to get the result from the second result, you will simply have to not loop over it but access the second element directly: 由于您只想从第二个结果中获取结果，因此您只需要不必在其上循环就可以直接访问第二个元素：

second_br = address.find_all('br', limit=2)[1]
result = second_br.next_sibling
print(result) # United States

Answer 2

You can use the .contents property of bs4. 您可以使用bs4的.contents属性。 If you check the contents of address you'll see that the text you want is in the last index. 如果检查address的contents ，您会看到所需的文本在最后一个索引中。 You can access it by using contents[-1] . 您可以使用contents[-1] 。

>>> address.contents
["1 McDonalds Dr", <br/>, 'OAK BROOK, IL 60523-1911', <br/>, "United States"]
>>> result = address.contents[-1]
>>> result
"United States"

之后提取第二个元素<br/>

问题描述

2 个解决方案

解决方案1
1 已采纳 2018-03-11 23:25:54

解决方案2
0 2018-03-12 02:54:07

之后提取第二个元素<br/>

问题描述

2 个解决方案

解决方案1 1 已采纳 2018-03-11 23:25:54

解决方案2 0 2018-03-12 02:54:07

解决方案1
1 已采纳 2018-03-11 23:25:54

解决方案2
0 2018-03-12 02:54:07