[英]How to pick out all non-NULL value from multiple columns in Python Dataframe
I had a DataFrame like below: 我有一个DataFrame如下:
column-a column-b column-c
0 Nan A B
1 A Nan C
2 Nan Nan C
3 A B C
I hope to create a new column-D to capture all non-NULL values from column A to C: 我希望创建一个新的列D来捕获从列A到C的所有非NULL值:
column d
0 A,B
1 A,C
2 C
3 A,B,C
Thanks! 谢谢!
You need to change the 'Nan'
to np.nan
, then using stack
with groupby
join
您需要更改
'Nan'
到np.nan
,然后使用stack
与groupby
join
df=df.replace('Nan',np.nan)
df.stack().groupby(level=0).agg(','.join)
Out[570]:
0 A,B
1 A,C
2 C
3 A,B,C
dtype: object
#df['column-d']= df.stack().groupby(level=0).agg(','.join)
After fixing the nan
s: 修复
nan
后:
df = df.replace('Nan', np.nan)
collect all non-null values in each row in a list and join the list items. 收集列表中每一行的所有非空值,并加入列表项。
df['column-d'] = df.apply(lambda x: ','.join(x[x.notnull()]), axis=1)
#0 A,B
#1 A,C
#2 C
#3 A,B,C
Surprisingly, this solution is somewhat faster than the stack/groupby
solution by Wen, at least for the posted dataset. 出乎意料的是,至少对于发布的数据集,此解决方案比Wen的
stack/groupby
解决方案要快一些。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.