I'm trying to count the individual values per group in a dataset and add them as a new column to a table. The first one works, the second one produces wrong values. When I use the following code unique_id_per_column = source_table.groupby("disease").some_id.nunique()
I'll get
| | disease | some_id |
|---:|:------------------------|--------:|
| 0 | disease1 | 121 |
| 1 | disease2 | 1 |
| 2 | disease3 | 5 |
| 3 | disease4 | 9 |
| 4 | disease5 | 77 |
These numbers seem to check out, but I want to add them to another table where I have already a column with all values per group. So I used the following code table["unique_ids"] = source_table.groupby("disease").uniqe_id.transform("nunique")
and I get the following table, with wrong numbers for every row except the first.
| | disease |some_id | unique_ids |
|---:|:------------------------|-------:|------------------:|
| 0 | disease1 | 151 | 121 |
| 1 | disease2 | 1 | 121 |
| 2 | disease3 | 5 | 121 |
| 3 | disease4 | 9 | 121 |
| 4 | disease5 | 91 | 121 |
I've expected that I will get the same results as in the first table. Anyone knows why I get the number for the first row repeated instead of correct numbers?
Solution with Series.map
if need create column in another DataFrame
:
s = source_table.groupby("disease").some_id.nunique()
table["unique_ids"] = table["disease"].map(s)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.