[英]pandas: modifying values in dataframe from another column
While splitting data into columns, there was some glitch, due to which I have got some noisy data.在将数据拆分成列时,出现了一些小故障,因此我得到了一些嘈杂的数据。
site code
--- ---
0 apple_123 45
1 apple_456 xy_33
2 facebook_123 24
3 google_123 NaN
4 google_123 pq_51
I need to clean the data, such that I get the following result:我需要清理数据,以便得到以下结果:
site code
--- ---
0 apple_123 45
1 apple_456_xy 33
2 facebook_123 24
3 google_123 NaN
4 google_123_pq 51
I have been able to obtain the rows that need to be modified, but am unable to progress further:我已经能够获得需要修改的行,但无法进一步进行:
import numpy as np
import pandas as pd
site = ['apple_123','apple_456','facebook_123','google_123','google_123']
code = [45,'xy_33',24,np.nan,'pq_51']
df = pd.DataFrame(list(zip(site,code)), columns=['site','code'])
df[(~df.code.astype(str).str.isdigit())&(~df.code.isna())]
Use Series.str.extract
for get non numeric and numeric values to helper DataFrame
and then processing each column separately - remove _
by Series.str.strip
, add from right side by Series.radd
and convert missing values to emty string, last add to code
column, for second use Series.fillna
for replace not mached values from 1
column to original:使用
Series.str.extract
为获取非数字和数值来帮助DataFrame
,然后分别处理每个列-删除_
通过Series.str.strip
,从右侧添加Series.radd
和转换遗漏值来emty串,最后加对列进行code
,第二次使用Series.fillna
将Series.fillna
的值从1
列替换为原始值:
df1 = df.code.str.extract('(\D+)(\d+)')
df['site'] += df1[0].str.strip('_').radd('_').fillna('')
df['code'] = df1[1].fillna(df['code'])
print (df)
site code
0 apple_123 45
1 apple_456_xy 33
2 facebook_123 24
3 google_123 NaN
4 google_123_pq 51
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.