[英]Python Pandas: Create new column out of other columns where value is not null
I have a data frame like this 我有一个这样的数据框
----------------
RecID| A |B
----------------
1 |NaN | x
2 |y | NaN
3 |z | NaN
4 |NaN | a
5 |NaN | b
And I want to create a new column, C, from A and B such that if A is null then fill with B and if B is null then fill with A: 我想从A和B创建一个新列C,这样,如果A为空则用B填充,如果B为空然后用A填充:
----------------------
RecID|A |B |C
----------------------
1 |NaN | x |x
2 |y | NaN |y
3 |z | NaN |z
4 |NaN | a |a
5 |NaN | b |b
Lastly, is there an efficient way to do this if I have more than two columns, eg I have columns AZ and want create a new column A1 out of columns AZ similar to above? 最后,如果我有两个以上的列,例如,我有AZ列,并想从AZ列中创建一个新的A1列,是否有一种有效的方法呢?
pandas
lookup
This is the generalizable solution OP was looking for and will work across an arbitrary number of columns. 这是OP一直在寻找的通用解决方案,并且可以在任意数量的列中使用。
lookup = df.loc[:, 'A':'B'].notnull().idxmax(1)
df.assign(A1=df.lookup(lookup.index, lookup.values))
RecID A B A1
0 1 NaN x x
1 2 y NaN y
2 3 z NaN z
3 4 NaN a a
4 5 NaN b b
fillna
df.assign(C=df.A.fillna(df.B))
RecID A B C
0 1 NaN x x
1 2 y NaN y
2 3 z NaN z
3 4 NaN a a
4 5 NaN b b
mask
df.assign(C=df.A.mask(df.A.isnull(), df.B))
RecID A B C
0 1 NaN x x
1 2 y NaN y
2 3 z NaN z
3 4 NaN a a
4 5 NaN b b
combine_first
df.assign(C=df.A.combine_first(df.B))
RecID A B C
0 1 NaN x x
1 2 y NaN y
2 3 z NaN z
3 4 NaN a a
4 5 NaN b b
numpy
np.where
df.assign(C=np.where(df.A.notnull(), df.A, df.B))
RecID A B C
0 1 NaN x x
1 2 y NaN y
2 3 z NaN z
3 4 NaN a a
4 5 NaN b b
In the case of multiple columns, you can use forward fill. 如果是多列,则可以使用正向填充。 This example assumes that you want to build a combination of all columns 'A' through 'Z': 本示例假定您要构建所有列“ A”至“ Z”的组合:
df['AZ'] = df.loc[:,'A':'Z'].fillna(method='ffill',axis=1)['Z']
This method works for two columns, too: 此方法也适用于两列:
df['C'] = df.loc[:,'A':'B'].fillna(method='ffill',axis=1)['B']
#0 x
#1 y
#2 z
#3 a
#4 b
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.