简体   繁体   中英

Pandas groupby: fill missing values from other group members

I think this is best shown with an example. What I'm trying to do is find the non-null number from a group and propagate it to the rest of the group.

In [52]: df = pd.DataFrame.from_dict({1:{'i_id': 2, 'i_num':1}, 2: {'i_id': 2, 'i_num': np.nan}, 3: {'i_id': 2, 'i_num': np.nan}, 4: {'i_id': 3, 'i_num': np.nan}, 5: {'i_id': 3, 'i_num': 5}}, orient='index')

In [53]: df
Out[53]:
   i_num  i_id
1      1     2
2    NaN     2
3    NaN     2
4    NaN     3
5      5     3

The DataFrame would look something like this. What I want is to take all the i_id == 2 and make their i_num == 1, and all the i_id == 3, and make their i_num == 5 (so both matching their non-null group neighbors).

So the end result would be this:

   i_num  i_id
1      1     2
2      1     2
3      1     2
4      5     3
5      5     3

first finds the first non-null value in a group. You can fill in the other values in each group like this:

df['i_num'] = df.groupby('i_id')['i_num'].transform('first')

This produces the column as required:

   i_num  i_id
1      1     2
2      1     2
3      1     2
4      5     3
5      5     3

Bear in mind that this will replace all values in the group with the first value, not just NaN values (this seems to be what you're looking for here though).

Alternatively - and to respect any other non-null values in the group - you can use fillna in the following way:

# make a column of first values for each group
x = df['i_id'].map(df.groupby('i_id')['i_num'].first())
# fill only NaN values using new column x
df['i_num'] = df['i_num'].fillna(x)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM