I have a DataFrame that looks like this:
ID DESCRIPTION TYPE1 TYPE2
12345678 EXAMPLENAME1 874.4 NaN
12345678 EXAMPLENAME2 854.4 NaN
12345678 EXAMPLENAME3 874.4 B-5
78978999 EXAMPLENAME2 788.8 B-9
78978999 EXAMPLENAME4 978.2 NaN
78978999 EXAMPLENAME1 288.3 NaN
92124566 EXAMPLENAME3 369.1 NaN
92124566 EXAMPLENAME3 289.1 B-3
92124566 EXAMPLENAME3 959.1 NaN
I want to get all the TYPE2
column that have the same ID with the value that is not NaN
. We can assume that:
TYPE2
row per ID will have a nonnull value.TYPE2
is unique per each ID. Final product should look like this:
ID DESCRIPTION TYPE1 TYPE2
12345678 EXAMPLENAME1 874.4 B-5
12345678 EXAMPLENAME2 854.4 B-5
12345678 EXAMPLENAME3 874.4 B-5
78978999 EXAMPLENAME2 788.8 B-9
78978999 EXAMPLENAME4 978.2 B-9
78978999 EXAMPLENAME1 288.3 B-9
92124566 EXAMPLENAME3 369.1 B-3
92124566 EXAMPLENAME3 289.1 B-3
92124566 EXAMPLENAME3 959.1 B-3
I've tried with ffill
, but can't establish the condition to fill only when ID is the same. There are about 1,500,000 different TYPE2
and ID
values, so manually establish them like df.loc[df["ID"]="12345678", "TYPE2"] = "B-5"
wouldn't work.
How can I have df.loc
check if the ID
is the same, then grabbing the nonnull value from TYPE2
and assing it to the rest of the ID
rows? Are there any other methods to get the same outcome?
To fill null values within each ID we must first use .groupby
and then combine .bfill()
and .ffill()
df['TYPE2'] = df.groupby('ID')['TYPE2'].bfill().ffill()
#result
ID DESCRIPTION TYPE1 TYPE2
0 12345678 EXAMPLENAME1 874.4 B-5
1 12345678 EXAMPLENAME2 854.4 B-5
2 12345678 EXAMPLENAME3 874.4 B-5
3 78978999 EXAMPLENAME2 788.8 B-9
4 78978999 EXAMPLENAME4 978.2 B-9
5 78978999 EXAMPLENAME1 288.3 B-9
6 92124566 EXAMPLENAME3 369.1 B-3
7 92124566 EXAMPLENAME3 289.1 B-3
8 92124566 EXAMPLENAME3 959.1 B-3
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.