I have large pandas dataframe with two columns rider_ID
and person_ID
given as:
ride_ID person_ID
ride_1 person1
ride_1 person2
ride_1 person3
ride_2 person1
ride_2 person4
ride_3 person1
ride_3 person5
ride_3 person2
ride_3 person3
..... ......
..... ......
For each unique ride_ID
the number of person_ID
could be anything either 2 or 20 or 100. All, I want to apply groupby on column ride_ID
such that column person_ID
will reflected into multiple columns with columns name as person_ID1
till person_IDn
. Expected output as;
ride_ID person_ID1 person_ID2 person_ID3 person_ID4 person_ID5 ....... person_IDn
ride_1 person1 person2 person3 NaN NaN ......
ride_2 person1 NaN NaN person4 NaN ......
ride_3 person1 person2 person3 NaN person5
You can use pivot() . For that, create a column "person_IDx" with values in serial fashion "person_ID1, person_ID2, ..., person_IDn" for each "ride_ID" type:
df = pd.DataFrame(data=[["ride_1","person1"],["ride_1","person2"],["ride_1","person3"],["ride_2","person1"],["ride_2","person4"],["ride_3","person1"],["ride_3","person5"],["ride_3","person2"],["ride_3","person3"]], columns=["ride_ID","person_ID"])
df["person_IDx"] = 1
df["person_IDx"] = df.groupby("ride_ID")["person_IDx"].transform("cumsum").apply(lambda x: f"person_ID{x}")
df = df.pivot(index="ride_ID", columns="person_IDx", values="person_ID").reset_index().rename_axis(columns={"person_IDx":""})
[Out]:
ride_ID person_ID1 person_ID2 person_ID3 person_ID4
0 ride_1 person1 person2 person3 NaN
1 ride_2 person1 person4 NaN NaN
2 ride_3 person1 person5 person2 person3
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.