如何从Python数据框的多个列中选择所有非NULL值

Question

I had a DataFrame like below: 我有一个DataFrame如下：

       column-a         column-b      column-c
0          Nan             A              B
1           A              Nan            C
2           Nan            Nan            C
3           A              B              C

I hope to create a new column-D to capture all non-NULL values from column A to C: 我希望创建一个新的列D来捕获从列A到C的所有非NULL值：

        column d
0        A,B
1        A,C
2        C
3        A,B,C

Thanks! 谢谢！

Answer 1

You need to change the 'Nan' to np.nan , then using stack with groupby join 您需要更改'Nan'到np.nan ，然后使用stack与groupby join

df=df.replace('Nan',np.nan)
df.stack().groupby(level=0).agg(','.join)
Out[570]: 
0      A,B
1      A,C
2        C
3    A,B,C
dtype: object

#df['column-d']= df.stack().groupby(level=0).agg(','.join)

Answer 2

After fixing the nan s: 修复nan后：

df = df.replace('Nan', np.nan)

collect all non-null values in each row in a list and join the list items. 收集列表中每一行的所有非空值，并加入列表项。

df['column-d'] = df.apply(lambda x: ','.join(x[x.notnull()]), axis=1)
#0      A,B
#1      A,C
#2        C
#3    A,B,C

Surprisingly, this solution is somewhat faster than the stack/groupby solution by Wen, at least for the posted dataset. 出乎意料的是，至少对于发布的数据集，此解决方案比Wen的stack/groupby解决方案要快一些。

如何从Python数据框的多个列中选择所有非NULL值

问题描述

2 个解决方案

解决方案1
2 已采纳 2018-09-08 03:42:38

解决方案2
1 2018-09-08 03:45:26

如何从Python数据框的多个列中选择所有非NULL值

问题描述

2 个解决方案

解决方案1 2 已采纳 2018-09-08 03:42:38

解决方案2 1 2018-09-08 03:45:26

解决方案1
2 已采纳 2018-09-08 03:42:38

解决方案2
1 2018-09-08 03:45:26