简体   繁体   中英

For each value in a column, how to count the number of unique values in its row?

Suppose I have a data frame:

ID       person_1     person_2
ID_001   Aaron        Ben
ID_003   Kate         Ben
ID_001   Aaron        Lou
ID_005   Lee          Ben
ID_006   Aaron        Cassie
ID_001   Tim          Ben
ID_003   Ben          Mal

For every ID in the column "ID", I want to count the number of unique names that were associated with the ID

My desired output:

ID       Count
ID_001   4
ID_003   3
ID_005   2
ID_006   2

Code for reproduction:

df = pd.DataFrame({
'ID': ["ID_001", "ID_003", "ID_001", "ID_005", "ID_006", "ID_001", "ID_003"],
'person1': ["Aaron","Kate","Aaron","Lee","Aaron","Tim","Ben"],
'person2': ["Ben","Ben","Lou","Ben","Cassie","Ben","Mal"]
})

Flat your columns person1 and person2 then remove duplicated names and finally count unique value per ID :

out = df.melt('ID').drop_duplicates(['ID', 'value']) \
        .value_counts('ID').rename('Count').reset_index()
print(out)

# Output
       ID  Count
0  ID_001      4
1  ID_003      3
2  ID_005      2
3  ID_006      2

Use df.melt('ID').groupby('ID')['value'].nunique() .

>>> df.melt('ID').groupby('ID')['value'].nunique()
ID
ID_001    4
ID_003    3
ID_005    2
ID_006    2
Name: value, dtype: int64

edit: df.set_index('ID').stack().groupby(level=0).nunique() works too.

Melt your dataframe together, drop duplicates , then group by ID and aggregate over the count of the variables. At least, rename the column variable to Count .

df.melt(["ID"]).drop_duplicates(["ID","value"]).groupby(
    ["ID"]).agg({"variable":"count"}).reset_index().rename(
    columns={"variable":"Count"})

   ID          Count
   ID_001      4
   ID_003      3
   ID_005      2
   ID_006      2

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM