[英]Replacing dataframe values after removing/replacing character in rows using Pandas
I have a dataframe df_in
like so: 我有一个数据
df_in
像这样:
import pandas as pd
import numpy as np
dic_in = {'A':['aa','bb','cc','dd','ee','ff','gg','uu','xx','yy','zz'],
'B':['200','200','AA200','AA040',np.nan,'500',np.nan,'0700','900','UKK','200'],
'C':['UNN','400',np.nan,'AA080','AA800','B',np.nan,'400',np.nan,'500','UKK']}
My goal is to investigate column B
and C
in such a way that: 我的目标是以下列方式调查
B
和C
栏:
'AA'
, then the number such part of the string must be removed leaving only the numeric part. 'AA'
,则必须删除字符串中此类部分的数字,仅保留数字部分。 ( AA123 ---> 123
). AA123 ---> 123
)。 If a zeros are present before the first non null element, they must be removed ( AA001234 ---> 1234
). AA001234 ---> 1234
)。 0.0
( NaN ---> 0.0
, UNN ----> 0.0
, UKK ---> 0.0
and so on). 0.0
( NaN ---> 0.0
, UNN ----> 0.0
, UKK ---> 0.0
等)。 070--->700
, 00007000--->7000
) 070--->700
00007000--->7000
) 100
. 100
。 The final result should look like this: 最终结果应如下所示:
# BEFORE # # AFTER #
A B C A B C
0 aa 200 UNN 0 aa 200 0.0
1 bb 200 400 1 bb 200 400
2 cc AA200 NaN 2 cc 20000 0.0
3 dd AA040 AA080 3 dd 4000 8000
4 ee NaN AA800 4 ee 0.0 80000
5 ff 500 B 5 ff 500 0.0
6 gg NaN NaN 6 gg 0.0 0.0
7 uu 0700 400 7 uu 700 400
8 xx 900 NaN 8 xx 900 0.0
9 yy UKK 500 9 yy 0.0 500
10 zz 200 UKK 10 zz 200 0.0
Do you know a smart and efficient way to achieve such goal? 您知道实现这一目标的明智而有效的方法吗?
Notice : all the numbers are in reality string and they should remain as so. 注意 :所有数字实际上都是字符串,应该保持原样。
You can use to_numeric
for replace not numeric to NaN
. 您可以使用
to_numeric
将非数字替换为NaN
。
Then extract
numbers from strings, remove 0
from left by lstrip
and add 00
. 然后从字符串中
extract
数字,将lstrip
左边的0
删除,然后添加00
。
Last combine_first
with fillna
and assign to columns: 最后将
combine_first
与fillna
并分配给列:
b = pd.to_numeric(df_in.B, errors='coerce')
c = pd.to_numeric(df_in.C, errors='coerce')
b1 = df_in.B.str.extract('(\d+)', expand=False).str.lstrip('0') + '00'
c1 = df_in.C.str.extract('(\d+)', expand=False).str.lstrip('0') + '00'
df_in.B = b.combine_first(b1).fillna(0)
df_in.C = c.combine_first(c1).fillna(0)
print (df_in)
A B C
0 aa 200 0
1 bb 200 400
2 cc 20000 0
3 dd 4000 8000
4 ee 0 80000
5 ff 500 0
6 gg 0 0
7 uu 700 400
8 xx 900 0
9 yy 0 500
10 zz 200 0
A bit modified solution last fillna
by string 0.0
convert all values to strings (avoid some strings and some numeric values): 字符串
0.0
最后一个fillna
的位修改后的解决方案将所有值转换为字符串(避免使用某些字符串和某些数字值):
b = pd.to_numeric(df_in.B, errors='coerce')
c = pd.to_numeric(df_in.C, errors='coerce')
b1 = df_in.B.str.extract('(\d+)', expand=False).str.lstrip('0') + '00'
c1 = df_in.C.str.extract('(\d+)', expand=False).str.lstrip('0') + '00'
df_in.B = b.combine_first(b1)
df_in.C = c.combine_first(c1)
df_in = df_in.fillna('0.0').astype(str)
print (df_in)
A B C
0 aa 200.0 0.0
1 bb 200.0 400.0
2 cc 20000 0.0
3 dd 4000 8000
4 ee 0.0 80000
5 ff 500.0 0.0
6 gg 0.0 0.0
7 uu 700.0 400.0
8 xx 900.0 0.0
9 yy 0.0 500.0
10 zz 200.0 0.0
Assuming that all the values in your dataframe are strings (including the NaN
s, otherwise you can convert them to an appropriate string with fillna
), you can use the following converter
function with applymap
on the two columns you want to convert. 假设数据框中的所有值都是字符串(包括
NaN
,否则可以使用fillna
将它们转换为适当的字符串),则可以在要转换的两列applymap
以下converter
函数与applymap
一起使用。
df = pd.DataFrame(dic_in, dtype=str).fillna('NAN')
converter = lambda x: str(int(x.replace('AA', ''))*100) if 'AA' in x else str(int(x)) if x.isdigit() else '0.0'
df[['B','C']] = df[['B','C']].applymap(converter)
contents of df
: df
内容:
A B C
0 aa 200 0.0
1 bb 200 400
2 cc 20000 0.0
3 dd 4000 8000
4 ee 0.0 80000
5 ff 500 0.0
6 gg 0.0 0.0
7 uu 700 400
8 xx 900 0.0
9 yy 0.0 500
10 zz 200 0.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.