Remove non-numeric characters using pandas

Question

I've recently got stuck in to Python to automate some repetitive tasks.

My script gathers two sets of data using APIs and using pandas, merges them into one data file where it does a series of checks then manipulates the data based on set criteria. It's taken me a solid 8 hours to figure this out and get something working but I've stumbled at the final hurdle

I'm trying to summarise the results of the check using a simple pivot table and need to sum the values stored in one of the df columns (commissionAmount). The issue is that the values stored in this column look like this

{'amount': 97.0, 'currency': 'GBP'}

I need it to contain only 97.0 but I can't figure it out.

Any help would be appreciated.

Answer 1

Assuming your format will be always the same for the numbers (2 digits before the decimal point, and one digit after):

df['Col1'].str.extract(r'(\d{2}\.\d)')

gives the right output based on the example given

df3 = pd.DataFrame()
df3['Col1'] = ["{'amount': 97.0, 'currency': 'GBP'}"]
df3['Col1'].str.extract(r'(\d{2}\.\d)')
0  97.0

Answer 2

Given df :

                                  col1
0  {'amount': 97.0, 'currency': 'GBP'}

We can extract just the amount by doing:

df.col1 = df.col1.str.get('amount')
print(df)

Output:

   col1
0  97.0

I actually fought for this previously hidden functionality to be added to the docs, which it is now~ pandas.Series.str.get :)

Remove non-numeric characters using pandas

Question

2 answers

solution1
0 2022-09-26 01:50:22

solution2
0 2022-09-26 02:17:38

Remove non-numeric characters using pandas

Question

2 answers

solution1 0 2022-09-26 01:50:22

solution2 0 2022-09-26 02:17:38

solution1
0 2022-09-26 01:50:22

solution2
0 2022-09-26 02:17:38