简体   繁体   中英

Remove non-numeric characters using pandas

I've recently got stuck in to Python to automate some repetitive tasks.

My script gathers two sets of data using APIs and using pandas, merges them into one data file where it does a series of checks then manipulates the data based on set criteria. It's taken me a solid 8 hours to figure this out and get something working but I've stumbled at the final hurdle

I'm trying to summarise the results of the check using a simple pivot table and need to sum the values stored in one of the df columns (commissionAmount). The issue is that the values stored in this column look like this

{'amount': 97.0, 'currency': 'GBP'}

I need it to contain only 97.0 but I can't figure it out.

Any help would be appreciated.

Assuming your format will be always the same for the numbers (2 digits before the decimal point, and one digit after):

df['Col1'].str.extract(r'(\d{2}\.\d)')

gives the right output based on the example given

df3 = pd.DataFrame()
df3['Col1'] = ["{'amount': 97.0, 'currency': 'GBP'}"]
df3['Col1'].str.extract(r'(\d{2}\.\d)')
0  97.0

Given df :

                                  col1
0  {'amount': 97.0, 'currency': 'GBP'}

We can extract just the amount by doing:

df.col1 = df.col1.str.get('amount')
print(df)

Output:

   col1
0  97.0

I actually fought for this previously hidden functionality to be added to the docs, which it is now~ pandas.Series.str.get :)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM