I have a pandas dataframe with many columns, most of them are null, but for each row there always is one and only one column with value a string.
I am creating a new column in the dataframe that selects the only non-null value:
data[label] = data.iloc[:,0]
for col in range(1,100) :
data[label] = data[label].fillna(data.iloc[:,col])
This works fine, however, I would also keep track of which one of these columns was the non-null, for each entry, so that the column label has that information as well. How do I know which column was non-empty?
Ex.
col0 col1 col2
"red"
"blue"
"yellow"
new column label is:
label
"red"/col1
"blue"/col0
"yellow"/col2
You can first convert df
to True
s where are values by notnull
and get columns names by idxmax
and lookup
for values:
cols = df.notnull().idxmax(axis=1)
df['a'] = df.lookup(df.index, cols) + '/' + cols
print (df)
col0 col1 col2 a
0 NaN red NaN red/col1
1 blue NaN NaN blue/col0
2 NaN NaN yellow yellow/col2
Another solution with fillna
and sum
:
cols = df.notnull().idxmax(axis=1)
df['a'] = df.fillna('').sum(axis=1) + '/' + cols
print (df)
col0 col1 col2 a
0 NaN red NaN red/col1
1 blue NaN NaN blue/col0
2 NaN NaN yellow yellow/col2
Another solution, thanks Jon Clements - use first_valid_index
:
cols = df.apply(pd.Series.first_valid_index, axis=1)
df['a'] = df.lookup(cols.index, cols) + '/' + cols
print (df)
col0 col1 col2 a
0 NaN red NaN red/col1
1 blue NaN NaN blue/col0
2 NaN NaN yellow yellow/col2
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.