Unique Values per row in Pandas groupby

Question

I have a simple dataframe (df) like shown below:

index Job  Person 
1     j1   Cathy
2     j2   Mark
3     j3   Cathy
4     j4   Steve
5     j5   Cathy
6          Cathy
7     j1   Cathy

I would like to convert the above dataframe as:

Person CountJob  JobDetails
Cathy     4      j1;j3;j5
Mark      1      j2
Steve     1      j4

ie, drop the Nulls and join Unique jobs for each person.

I can partially solve this using groupby :

df.groupby('Person').agg(CountJob=('Job','count'),
                         JobDetails=('Job',lambda x: ';'.join(x.dropna()))
                        )

I am able to drop null values but can't seem to get unique values; in the above example, for ' Cathy ', ' JobDetails ' becomes j1;j3;j5;j1

Answer 1

Adding unique chain with dropna

df.groupby('Person').agg(CountJob=('Job','count'),
                         JobDetails=('Job',lambda x: ';'.join(x.dropna().unique()))
                        )
Out[143]: 
        CountJob JobDetails
Person                     
Cathy          4   j1;j3;j5
Mark           1         j2
Steve          1         j4

Answer 2

You could consider dropping the null before aggregating:

   (
    df.dropna()
    .groupby("Person")
    .agg(CountJob=("Job", "count"), JobDetails=("Job", set))
    .assign(JobDetails=lambda df: df.JobDetails.str.join(","))
)

      CountJob  JobDetails
Person      
Cathy   4   j1,j3,j5
Mark    1   j2
Steve   1   j4

Answer 3

Another method would be to set the index as the Person and stack to drop any nulls, then we can apply any group by args without any issue.

df.set_index('Person').stack().groupby('Person')\
                .agg(JobDetails=('unique'), Job=('count'))

          JobDetails  Job
Person                   
Cathy   [j1, j3, j5]    4
Mark            [j2]    1
Steve           [j4]    1

Unique Values per row in Pandas groupby

Question

3 answers

solution1
3 ACCPTED 2021-03-03 22:05:56

solution2
2 2021-03-03 22:09:08

solution3
1 2021-03-03 22:08:07

Unique Values per row in Pandas groupby

Question

3 answers

solution1 3 ACCPTED 2021-03-03 22:05:56

solution2 2 2021-03-03 22:09:08

solution3 1 2021-03-03 22:08:07

solution1
3 ACCPTED 2021-03-03 22:05:56

solution2
2 2021-03-03 22:09:08

solution3
1 2021-03-03 22:08:07