如何从python列表中的元素中提取浮点数？

Question

I am using BeautifulSoup4 to build a script that does financial calculations. 我正在使用BeautifulSoup4来构建执行财务计算的脚本。 I have successfully extracted data to a list, but only need the float numbers from the elements. 我已成功将数据提取到列表中，但只需要元素中的浮点数即可。

For Example: 例如：

Volume = soup.find_all('td', {'class':'text-success'})

print (Volume)

This gives me the list output of: 这给了我清单输出：

[<td class="text-success">+1.3 LTC</td>, <td class="text- success">+5.49<span class="muteds">340788</span> LTC</td>, <td class="text-success">+1.3 LTC</td>,]

I want it to become: 我希望它成为：

[1.3, 5.49, 1.3]

How can I do this? 我怎样才能做到这一点？

Thank-you so much for reading my post I greatly appreciate any help I can get. 非常感谢您阅读我的文章，我非常感谢我能获得的任何帮助。

Answer 1

You can find the first text node inside every td , split it by space, get the first item and convert it to float via float() - the + would be handled automatically: 您可以在每个td找到第一个文本节点，将其按空格分割，获取第一个项目，然后通过float()将其转换为float +将自动处理：

from bs4 import BeautifulSoup

data = """
<table>
    <tr>
        <td class="text-success">+1.3 LTC</td>
        <td class="text-success">+5.49<span class="muteds">340788</span> LTC</td>
        <td class="text-success">+1.3 LTC</td>
    </tr>
</table>"""

soup = BeautifulSoup(data, "html.parser")

print([
    float(td.find(text=True).split(" ", 1)[0])
    for td in soup.find_all('td', {'class':'text-success'})
])

Prints [1.3, 5.49, 1.3] . 打印[1.3, 5.49, 1.3] 。

Note how the find(text=True) helps to avoid extracting the 340788 in the second td . 注意find(text=True)如何避免在第二个td提取340788 。

Answer 2

You can do 你可以做

>>> import re
>>> re.findall("\d+\.\d+", yourString)
['1.3', '5.49', '1.3']
>>>

Then to convert to floats 然后转换为浮点数

>>> [float(x) for x in re.findall("\d+\.\d+", yourString)]
[1.3, 5.49, 1.3]
>>>

如何从python列表中的元素中提取浮点数？

问题描述

2 个解决方案

解决方案1
2 已采纳 2016-09-11 15:00:54

解决方案2
1 2016-09-11 15:00:30

如何从python列表中的元素中提取浮点数？

问题描述

2 个解决方案

解决方案1 2 已采纳 2016-09-11 15:00:54

解决方案2 1 2016-09-11 15:00:30

解决方案1
2 已采纳 2016-09-11 15:00:54

解决方案2
1 2016-09-11 15:00:30