简体   繁体   English

我怎样才能随机 select 行与 Python 中每个组的唯一值的数量成比例?

[英]How can I select rows randomly in proportion to the number of unique values for each group in Python?

I would like to random select rows proportionate to the number of unique values in column "ID" grouping by column "Team".我想随机 select 行与按“团队”列分组的“ID”列中唯一值的数量成比例。 Further, I would like to only retrieve 8 total rows.此外,我只想检索 8 行。 I have:我有:

|  ID   |  Team |  Color       |
| ----- | ----- | ------------ |
|  1    |  A    |  Blue        |
|  2    |  B    |  Red         |
|  2    |  B    |  Green       |
|  3    |  A    |  Blue        |
|  6    |  C    |  Red         |
|  1    |  B    |  Yellow      |
|  2    |  B    |  Green       |
|  9    |  A    |  Blue        |
|  6    |  C    |  Red         |
|  1    |  B    |  Yellow      |
|  9    |  A    |  Blue        |
|  1    |  A    |  Purple      |

Only the proportions are looking at unique values.只有比例在看独特的价值。 The rows pulled do not necessarily need to be unique in anyway.无论如何,拉出的行不一定必须是唯一的。 Using the above table the proportions would be:使用上表,比例将是:

|  Team  | Unique IDs |  Proportion |  Number selected |
| ------ | ---------- | ----------- | ---------------- |
|  A     |    3       |  0.500      |       4          |
|  B     |    2       |  0.333      |       3          |
|  C     |    1       |  0.167      |       1          |

So since I want 8 total rows selected proportionately, I should end up with something like the following:因此,由于我希望按比例选择 8 行,我最终应该得到如下内容:

|  ID   |  Team |  Color       |
| ----- | ----- | ------------ |
|  1    |  A    |  Blue        |
|  3    |  A    |  Blue        |
|  9    |  A    |  Blue        |
|  1    |  A    |  Purple      |
|  2    |  B    |  Green       |
|  2    |  B    |  Red         |
|  1    |  B    |  Yellow      |
|  6    |  C    |  Red         |
  • Calculate unique_counts - the number of unique 'ID' s in each 'Team' group,计算unique_counts - 每个'Team'组中唯一'ID'的数量,
  • convert unique_counts into nums_selected - the number of elements for each group that have to be selectedunique_counts转换为nums_selected - 必须选择的每个组的元素数
  • use nums_selected to .sample this many elements from each 'Team' group:使用nums_selected对每个'Team'组中的这么多元素进行.sample
n_total = 8
unique_counts = df.groupby('Team')['ID'].agg('nunique')
nums_selected= np.floor(unique_counts / unique_counts.sum() * n_total).astype(int)  # rounded down

df.groupby('Team', group_keys=False).apply(      # for each 'Team' group:
    lambda x: x.sample(n=nums_selected[x.name],  # sample this many rows
                       replace=True)             # (with replacement)
    )

Note:笔记:

The result can contain less elements than n_total because nums_selected are rounded down when converting from float to int .结果可以包含比n_total更少的元素,因为nums_selected在从float转换为int时会向下舍入。 However, you may use any method to do this conversion: np.ceil , pd.Series.round , or any other method you choose to your liking.但是,您可以使用任何方法进行此转换: np.ceilpd.Series.round或您选择的任何其他方法。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 对于 Pandas DataFrame 列中的每个唯一值,如何随机选择一定比例的行? - For each unique value in a pandas DataFrame column, how can I randomly select a proportion of rows? 如何为 Python 中的列中的每组值生成一个随机数? - How can I generate a random number for each group of values in a column in Python? 如何随机选择每组固定数量的行(如果更大),否则选择熊猫中的所有行? - How to randomly select fixed number of rows (if greater) per group else select all rows in pandas? 如何获得给定数量的层变体的独特组合,同时使用 Python 保持每个层变体的给定比例? - How to get a given number of unique combinations of layers variations, while maintaining a given proportion of each layer variant using Python? 如何使用 displot 在 python 中制作 seaborn plot ,其中我们计算一个字段中的唯一值而不是总行数? - How can I make a seaborn plot in python with displot where we count unique values in one field rather than the total number of rows? 如何使用python为表中的每个组选择具有最小值的所有行 - How to use python to select all rows with minimum values for each group in a table 通过(python)获取组中每个元素的比例 - Get proportion of each element of a group by (python) 如何随机更改pandas DataFrame中某些行的值? - How can I randomly change the values of some rows in a pandas DataFrame? 我如何 select 为 pandas 数据帧中的每个组的 n 行随机序列? - How can I select a random sequence of n rows for each group in a pandas data frame? 将数据框分组并检查每组中唯一值的数量 - Group the dataframe and check number of unique values in each group
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM