I have a simple dataframe
(df) like shown below:
index Job Person
1 j1 Cathy
2 j2 Mark
3 j3 Cathy
4 j4 Steve
5 j5 Cathy
6 Cathy
7 j1 Cathy
I would like to convert the above dataframe
as:
Person CountJob JobDetails
Cathy 4 j1;j3;j5
Mark 1 j2
Steve 1 j4
ie, drop the Nulls and join Unique jobs for each person.
I can partially solve this using groupby
:
df.groupby('Person').agg(CountJob=('Job','count'),
JobDetails=('Job',lambda x: ';'.join(x.dropna()))
)
I am able to drop null values but can't seem to get unique values; in the above example, for ' Cathy ', ' JobDetails ' becomes j1;j3;j5;j1
Adding unique
chain with dropna
df.groupby('Person').agg(CountJob=('Job','count'),
JobDetails=('Job',lambda x: ';'.join(x.dropna().unique()))
)
Out[143]:
CountJob JobDetails
Person
Cathy 4 j1;j3;j5
Mark 1 j2
Steve 1 j4
You could consider dropping the null before aggregating:
(
df.dropna()
.groupby("Person")
.agg(CountJob=("Job", "count"), JobDetails=("Job", set))
.assign(JobDetails=lambda df: df.JobDetails.str.join(","))
)
CountJob JobDetails
Person
Cathy 4 j1,j3,j5
Mark 1 j2
Steve 1 j4
Another method would be to set the index as the Person
and stack
to drop any nulls, then we can apply any group by args without any issue.
df.set_index('Person').stack().groupby('Person')\
.agg(JobDetails=('unique'), Job=('count'))
JobDetails Job
Person
Cathy [j1, j3, j5] 4
Mark [j2] 1
Steve [j4] 1
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.