Suppose I have a data frame:
ID person_1 person_2
ID_001 Aaron Ben
ID_003 Kate Ben
ID_001 Aaron Lou
ID_005 Lee Ben
ID_006 Aaron Cassie
ID_001 Tim Ben
ID_003 Ben Mal
For every ID in the column "ID", I want to count the number of unique names that were associated with the ID
My desired output:
ID Count
ID_001 4
ID_003 3
ID_005 2
ID_006 2
Code for reproduction:
df = pd.DataFrame({
'ID': ["ID_001", "ID_003", "ID_001", "ID_005", "ID_006", "ID_001", "ID_003"],
'person1': ["Aaron","Kate","Aaron","Lee","Aaron","Tim","Ben"],
'person2': ["Ben","Ben","Lou","Ben","Cassie","Ben","Mal"]
})
Flat your columns person1
and person2
then remove duplicated names and finally count unique value per ID
:
out = df.melt('ID').drop_duplicates(['ID', 'value']) \
.value_counts('ID').rename('Count').reset_index()
print(out)
# Output
ID Count
0 ID_001 4
1 ID_003 3
2 ID_005 2
3 ID_006 2
Use df.melt('ID').groupby('ID')['value'].nunique()
.
>>> df.melt('ID').groupby('ID')['value'].nunique()
ID
ID_001 4
ID_003 3
ID_005 2
ID_006 2
Name: value, dtype: int64
edit: df.set_index('ID').stack().groupby(level=0).nunique()
works too.
Melt
your dataframe together, drop duplicates
, then group by ID
and aggregate
over the count
of the variables. At least, rename
the column variable to Count
.
df.melt(["ID"]).drop_duplicates(["ID","value"]).groupby(
["ID"]).agg({"variable":"count"}).reset_index().rename(
columns={"variable":"Count"})
ID Count
ID_001 4
ID_003 3
ID_005 2
ID_006 2
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.