I have the following column in my dataset, the data comes in as-is from my data source:
Salary
~£2000
~£2000.15 per week
~£2000.50 per month
~£2000 - ~£5000 range
100000INR
INR
Now I want to create a new column that should look like this:
Salary_clean
2000
104007.8
240006
35000
964
0
So the below logic will follow(all salareis are annual eventually once clearned):
How can I achieve this?
Disclaimer : this code can be dangerous (the eval
function is used without any caution). In addition, the code is totally under optimized but has the advantage of being compact.
d = {r'~[^\d]+': r'',
r'per week': r'* 52',
r'per month': r'* 12',
r'(.*) - (.*) range': r'(\1 + \2) / 2',
r'\dINR': r' * 0.0096',
r'^[^\W\d]*$': r'0'}
df['Salary_clean'] = df['Salary'].replace(d, regex=True).apply(eval)
>>> df
Salary Salary_clean
0 ~£2000 2000.0
1 ~£2000.15 per week 104007.8
2 ~£2000.50 per month 24006.0
3 ~£2000 - ~£5000 range 3500.0
4 100000INR 96.0
5 INR 0.0
Result of replace
method:
>>> df['Salary'].replace(d, regex=True)
0 2000
1 2000.15 * 52
2 2000.50 * 12
3 (2000 + 5000) / 2
4 10000 * 0.0096
5 0
Name: Salary, dtype: object
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.