I have a dataframe df_in
like so:
import pandas as pd
import numpy as np
dic_in = {'A':['aa','bb','cc','dd','ee','ff','gg','uu','xx','yy','zz'],
'B':['200','200','AA200','AA040',np.nan,'500',np.nan,'0700','900','UKK','200'],
'C':['UNN','400',np.nan,'AA080','AA800','B',np.nan,'400',np.nan,'500','UKK']}
My goal is to investigate column B
and C
in such a way that:
'AA'
, then the number such part of the string must be removed leaving only the numeric part. ( AA123 ---> 123
). If a zeros are present before the first non null element, they must be removed ( AA001234 ---> 1234
). 0.0
( NaN ---> 0.0
, UNN ----> 0.0
, UKK ---> 0.0
and so on). 070--->700
, 00007000--->7000
) 100
. The final result should look like this:
# BEFORE # # AFTER #
A B C A B C
0 aa 200 UNN 0 aa 200 0.0
1 bb 200 400 1 bb 200 400
2 cc AA200 NaN 2 cc 20000 0.0
3 dd AA040 AA080 3 dd 4000 8000
4 ee NaN AA800 4 ee 0.0 80000
5 ff 500 B 5 ff 500 0.0
6 gg NaN NaN 6 gg 0.0 0.0
7 uu 0700 400 7 uu 700 400
8 xx 900 NaN 8 xx 900 0.0
9 yy UKK 500 9 yy 0.0 500
10 zz 200 UKK 10 zz 200 0.0
Do you know a smart and efficient way to achieve such goal?
Notice : all the numbers are in reality string and they should remain as so.
You can use to_numeric
for replace not numeric to NaN
.
Then extract
numbers from strings, remove 0
from left by lstrip
and add 00
.
Last combine_first
with fillna
and assign to columns:
b = pd.to_numeric(df_in.B, errors='coerce')
c = pd.to_numeric(df_in.C, errors='coerce')
b1 = df_in.B.str.extract('(\d+)', expand=False).str.lstrip('0') + '00'
c1 = df_in.C.str.extract('(\d+)', expand=False).str.lstrip('0') + '00'
df_in.B = b.combine_first(b1).fillna(0)
df_in.C = c.combine_first(c1).fillna(0)
print (df_in)
A B C
0 aa 200 0
1 bb 200 400
2 cc 20000 0
3 dd 4000 8000
4 ee 0 80000
5 ff 500 0
6 gg 0 0
7 uu 700 400
8 xx 900 0
9 yy 0 500
10 zz 200 0
A bit modified solution last fillna
by string 0.0
convert all values to strings (avoid some strings and some numeric values):
b = pd.to_numeric(df_in.B, errors='coerce')
c = pd.to_numeric(df_in.C, errors='coerce')
b1 = df_in.B.str.extract('(\d+)', expand=False).str.lstrip('0') + '00'
c1 = df_in.C.str.extract('(\d+)', expand=False).str.lstrip('0') + '00'
df_in.B = b.combine_first(b1)
df_in.C = c.combine_first(c1)
df_in = df_in.fillna('0.0').astype(str)
print (df_in)
A B C
0 aa 200.0 0.0
1 bb 200.0 400.0
2 cc 20000 0.0
3 dd 4000 8000
4 ee 0.0 80000
5 ff 500.0 0.0
6 gg 0.0 0.0
7 uu 700.0 400.0
8 xx 900.0 0.0
9 yy 0.0 500.0
10 zz 200.0 0.0
Assuming that all the values in your dataframe are strings (including the NaN
s, otherwise you can convert them to an appropriate string with fillna
), you can use the following converter
function with applymap
on the two columns you want to convert.
df = pd.DataFrame(dic_in, dtype=str).fillna('NAN')
converter = lambda x: str(int(x.replace('AA', ''))*100) if 'AA' in x else str(int(x)) if x.isdigit() else '0.0'
df[['B','C']] = df[['B','C']].applymap(converter)
contents of df
:
A B C
0 aa 200 0.0
1 bb 200 400
2 cc 20000 0.0
3 dd 4000 8000
4 ee 0.0 80000
5 ff 500 0.0
6 gg 0.0 0.0
7 uu 700 400
8 xx 900 0.0
9 yy 0.0 500
10 zz 200 0.0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.