简体   繁体   中英

Getting different Values when using groupby(column)["id"].nunique and trying to add a column using transform

I'm trying to count the individual values per group in a dataset and add them as a new column to a table. The first one works, the second one produces wrong values. When I use the following code unique_id_per_column = source_table.groupby("disease").some_id.nunique() I'll get

|    | disease                 | some_id |
|---:|:------------------------|--------:|
|  0 | disease1                |   121   |
|  1 | disease2                |     1   |
|  2 | disease3                |     5   |
|  3 | disease4                |     9   |
|  4 | disease5                |    77   |

These numbers seem to check out, but I want to add them to another table where I have already a column with all values per group. So I used the following code table["unique_ids"] = source_table.groupby("disease").uniqe_id.transform("nunique") and I get the following table, with wrong numbers for every row except the first.

|    | disease                 |some_id |   unique_ids      |
|---:|:------------------------|-------:|------------------:|
|  0 | disease1                |   151  |               121 |
|  1 | disease2                |     1  |               121 |
|  2 | disease3                |     5  |               121 |
|  3 | disease4                |     9  |               121 |
|  4 | disease5                |    91  |               121 |

I've expected that I will get the same results as in the first table. Anyone knows why I get the number for the first row repeated instead of correct numbers?

Solution with Series.map if need create column in another DataFrame :

s = source_table.groupby("disease").some_id.nunique()

table["unique_ids"] = table["disease"].map(s) 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM