如何将 DataFrame 行数限制为特定列中的第 X 个唯一值？

Question

Say for example we have the following DataFrame:例如，我们有以下 DataFrame：

And we would know we wanted an x(say 3) number of unique values in column A. Then the desired output would be:我们会知道我们想要在 A 列中有 x（比如 3）个唯一值。那么所需的 output 将是：

I thought about looping through the column in question, counting the number of unique values by tracking and taking the subset of the DataFrame with the right index.我考虑过遍历有问题的列，通过跟踪并获取具有正确索引的 DataFrame 的子集来计算唯一值的数量。 I am still a newbie to Python and I believe there would be a more efficient way to do this, please share your solutions.我仍然是 Python 的新手，我相信会有更有效的方法来做到这一点，请分享您的解决方案。 Appreciated!赞赏！

Answer 1

You can try series.factorize which indexes the unique values starting at 0 and then select the values which is <= n-1 ( because index starts at 0 ),hence reserves order too:您可以尝试series.factorize索引从 0 开始的唯一值，然后 select 是 <= n-1 的值（因为索引从 0 开始），因此也保留订单：

n=3
df[df['A'].factorize()[0]<=n-1]

Answer 2

You can use np.random.choice to select the unique id, then isin to select rows with those id:您可以使用np.random.choice到 select 唯一的 id，然后使用这些 id 到isin行：

selected_ids = np.random.choice(df['A'].unique(), replace=False, size=3)

df[df['A'].isin(selected_ids)]

如何将 DataFrame 行数限制为特定列中的第 X 个唯一值？

问题描述

2 个解决方案

解决方案1
2 2021-03-02 16:24:45

解决方案2
1 2021-03-02 16:18:19

如何将 DataFrame 行数限制为特定列中的第 X 个唯一值？

问题描述

2 个解决方案

解决方案1 2 2021-03-02 16:24:45

解决方案2 1 2021-03-02 16:18:19

解决方案1
2 2021-03-02 16:24:45

解决方案2
1 2021-03-02 16:18:19