简体   繁体   中英

Saving only non-null entry value and column number from pandas df with only one non-null value per row

I have a pandas dataframe with many columns, most of them are null, but for each row there always is one and only one column with value a string.

I am creating a new column in the dataframe that selects the only non-null value:

data[label] = data.iloc[:,0]  
for col in range(1,100) :
    data[label] = data[label].fillna(data.iloc[:,col])

This works fine, however, I would also keep track of which one of these columns was the non-null, for each entry, so that the column label has that information as well. How do I know which column was non-empty?

Ex.

col0      col1     col2
          "red"
"blue"
                  "yellow"

new column label is:

label
"red"/col1
"blue"/col0
"yellow"/col2

You can first convert df to True s where are values by notnull and get columns names by idxmax and lookup for values:

cols = df.notnull().idxmax(axis=1)
df['a'] = df.lookup(df.index, cols) + '/' + cols
print (df)
   col0 col1    col2            a
0   NaN  red     NaN     red/col1
1  blue  NaN     NaN    blue/col0
2   NaN  NaN  yellow  yellow/col2

Another solution with fillna and sum :

cols = df.notnull().idxmax(axis=1)
df['a'] = df.fillna('').sum(axis=1) + '/' + cols
print (df)
   col0 col1    col2            a
0   NaN  red     NaN     red/col1
1  blue  NaN     NaN    blue/col0
2   NaN  NaN  yellow  yellow/col2

Another solution, thanks Jon Clements - use first_valid_index :

cols = df.apply(pd.Series.first_valid_index, axis=1)
df['a'] = df.lookup(cols.index, cols)  + '/' + cols
print (df)
   col0 col1    col2            a
0   NaN  red     NaN     red/col1
1  blue  NaN     NaN    blue/col0
2   NaN  NaN  yellow  yellow/col2

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM