简体   繁体   中英

Find a first non NaN value in Pandas

I have a Pandas dataframe such that

|user_id|value|No|
|:-:|:-:|:-:|
|id1|100|1|
|id1|200|2|
|id1|250|3|
|id2|NaN|1|
|id2|100|2|
|id3|400|1|
|id3|NaN|2|
|id3|200|3|
|id4|NaN|1|
|id4|NaN|2|
|id4|300|3|.

Then I want the folloing dataset:

|user_id|value|No|NewNo|
|:-:|:-:|:-:|:-:|
|id1|100|1|1|
|id1|200|2|2|
|id1|250|3|3|
|id2|100|2|1|
|id3|400|1|1|
|id3|NaN|2|2|
|id3|200|3|3|
|id4|300|3|1|

namely, I want to delete NaN values such that the first value of user_id is not NaN value. Thank you.

you can groupby & forward fill the value column. Null values in the transformed data indicate the values from the start for each group that are null. Filter out the rows that are null

df2 = df[df.groupby('user_id').value.ffill().apply(pd.notnull)].copy()
# application of copy here creates a new data frame and allows us to assign
# values to the result (df2). This is needed to create the column `NewNo` 
# in the next & final step
# df2 outputs:
   user_id  value  No
0    'id1'  100.0   1
1    'id1'  200.0   2
2    'id1'  250.0   3
4    'id2'  100.0   2
5    'id3'  400.0   1
6    'id3'    NaN   2
7    'id3'  200.0   3
10   'id4'  300.0   3

Generate NewNo column using ranking within the group.

df2['NewNo'] = df2.groupby('user_id').No.rank()

# df2 outputs:

   user_id  value  No  NewNo
0    'id1'  100.0   1    1.0
1    'id1'  200.0   2    2.0
2    'id1'  250.0   3    3.0
4    'id2'  100.0   2    1.0
5    'id3'  400.0   1    1.0
6    'id3'    NaN   2    2.0
7    'id3'  200.0   3    3.0
10   'id4'  300.0   3    1.0

groupby + first_valid_index + cumcount

You can calculate indices for first non-null values by group, then use Boolean indexing:

# use transform to align groupwise first_valid_index with dataframe
firsts = df.groupby('user_id')['value'].transform(pd.Series.first_valid_index)

# apply Boolean filter
res = df[df.index >= firsts]

# use groupby + cumcount to add groupwise labels
res['NewNo'] = res.groupby('user_id').cumcount() + 1

print(res)

   user_id  value  No  NewNo
0      id1  100.0   1      1
1      id1  200.0   2      2
2      id1  250.0   3      3
4      id2  100.0   2      1
5      id3  400.0   1      1
6      id3    NaN   2      2
7      id3  200.0   3      3
10     id4  300.0   3      1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM