简体   繁体   English

使用 pandas 删除非数字字符

[英]Remove non-numeric characters using pandas

I've recently got stuck in to Python to automate some repetitive tasks.我最近陷入了 Python 以自动化一些重复性任务。

My script gathers two sets of data using APIs and using pandas, merges them into one data file where it does a series of checks then manipulates the data based on set criteria.我的脚本使用 API 和 pandas 收集两组数据,将它们合并到一个数据文件中,并在其中进行一系列检查,然后根据设定的标准处理数据。 It's taken me a solid 8 hours to figure this out and get something working but I've stumbled at the final hurdle我花了整整 8 个小时来解决这个问题并得到一些工作,但我在最后一个障碍上绊倒了

I'm trying to summarise the results of the check using a simple pivot table and need to sum the values stored in one of the df columns (commissionAmount).我正在尝试使用简单的 pivot 表来总结检查结果,并且需要将存储在 df 列之一(commissionAmount)中的值相加。 The issue is that the values stored in this column look like this问题是存储在此列中的值如下所示

{'amount': 97.0, 'currency': 'GBP'} {'金额':97.0,'货币':'英镑'}

I need it to contain only 97.0 but I can't figure it out.我需要它只包含 97.0 但我无法弄清楚。

Any help would be appreciated.任何帮助,将不胜感激。

Assuming your format will be always the same for the numbers (2 digits before the decimal point, and one digit after):假设您的数字格式始终相同(小数点前 2 位,小数点后 1 位):

df['Col1'].str.extract(r'(\d{2}\.\d)')

gives the right output based on the example given根据给出的示例给出正确的 output

df3 = pd.DataFrame()
df3['Col1'] = ["{'amount': 97.0, 'currency': 'GBP'}"]
df3['Col1'].str.extract(r'(\d{2}\.\d)')
0  97.0

Given df :给定df

                                  col1
0  {'amount': 97.0, 'currency': 'GBP'}

We can extract just the amount by doing:我们可以通过执行以下操作来提取amount

df.col1 = df.col1.str.get('amount')
print(df)

Output: Output:

   col1
0  97.0

I actually fought for this previously hidden functionality to be added to the docs, which it is now~ pandas.Series.str.get :)我实际上争取将这个以前隐藏的功能添加到文档中,现在是~ pandas.Series.str.get :)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM