Cleaning and Manipulating a column using pandas

Question

I have the following column in my dataset, the data comes in as-is from my data source:

Salary
~£2000
~£2000.15 per week
~£2000.50 per month
~£2000 - ~£5000 range
100000INR
INR

Now I want to create a new column that should look like this:

Salary_clean
2000
104007.8
240006
35000
964
0

So the below logic will follow(all salareis are annual eventually once clearned):

When the column has a standalone number that means the salary is already presented annually and require no action
when salary has "per week" written on the side, then multiply that salary by 52
when salary has "per month" written on the side, then multiply that salary by 12
when salary has "xy range" written on the side, then calculate the median of the range and that would be the correct salary
when salary has "XXX currency" written on the side like INR, then calculate the salary by using the current conversion rate of that currency to GBP(Pounds)
When salary has just a currency code like "XXX", then put salary as 0

How can I achieve this?

Answer 1

Disclaimer : this code can be dangerous (the eval function is used without any caution). In addition, the code is totally under optimized but has the advantage of being compact.

d = {r'~[^\d]+': r'',
     r'per week': r'* 52',
     r'per month': r'* 12',
     r'(.*) - (.*) range': r'(\1 + \2) / 2',
     r'\dINR': r' * 0.0096',
     r'^[^\W\d]*$': r'0'}

df['Salary_clean'] = df['Salary'].replace(d, regex=True).apply(eval)

>>> df
                  Salary  Salary_clean
0                 ~£2000        2000.0
1     ~£2000.15 per week      104007.8
2    ~£2000.50 per month       24006.0
3  ~£2000 - ~£5000 range        3500.0
4              100000INR          96.0
5                    INR           0.0

Result of replace method:

>>> df['Salary'].replace(d, regex=True)

0                 2000
1         2000.15 * 52
2         2000.50 * 12
3    (2000 + 5000) / 2
4       10000 * 0.0096
5                    0
Name: Salary, dtype: object

Cleaning and Manipulating a column using pandas

Question

1 answers

solution1
0 2021-05-18 22:09:45

Cleaning and Manipulating a column using pandas

Question

1 answers

solution1 0 2021-05-18 22:09:45

solution1
0 2021-05-18 22:09:45