简体   繁体   English

如何将 DataFrame 行数限制为特定列中的第 X 个唯一值?

[英]How to restrict DataFrame number of rows to the Xth unique value in certain column?

Say for example we have the following DataFrame:例如,我们有以下 DataFrame:

A B
1 2
1 2
2 3
3 4
4 5 
4 2

And we would know we wanted an x(say 3) number of unique values in column A. Then the desired output would be:我们会知道我们想要在 A 列中有 x(比如 3)个唯一值。那么所需的 output 将是:

A B
1 2
1 2
2 3
3 4

I thought about looping through the column in question, counting the number of unique values by tracking and taking the subset of the DataFrame with the right index.我考虑过遍历有问题的列,通过跟踪并获取具有正确索引的 DataFrame 的子集来计算唯一值的数量。 I am still a newbie to Python and I believe there would be a more efficient way to do this, please share your solutions.我仍然是 Python 的新手,我相信会有更有效的方法来做到这一点,请分享您的解决方案。 Appreciated!赞赏!

You can try series.factorize which indexes the unique values starting at 0 and then select the values which is <= n-1 ( because index starts at 0 ),hence reserves order too:您可以尝试series.factorize索引从 0 开始的唯一值,然后 select 是 <= n-1 的值(因为索引从 0 开始),因此也保留订单:

n=3
df[df['A'].factorize()[0]<=n-1]

   A  B
0  1  2
1  1  2
2  2  3
3  3  4

You can use np.random.choice to select the unique id, then isin to select rows with those id:您可以使用np.random.choice到 select 唯一的 id,然后使用这些 id 到isin行:

selected_ids = np.random.choice(df['A'].unique(), replace=False, size=3)

df[df['A'].isin(selected_ids)]

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 给定唯一的列值,Pandas 数据框如何删除以行长小于数字为条件的行? - Pandas dataframe how to remove rows conditioned on the length of rows being smaller than a number, given a unique column value? 如何删除Pandas DataFrame某列值为NaN的行 - How to drop rows of Pandas DataFrame whose value in a certain column is NaN 如果列中的行之一匹配某个值,如何返回 dataframe - how to return a dataframe if one of rows in column match certain value 如何使用重复值填充数据框中的列一定次数? - How to fill column in dataframe with repeating value a certain number of times? 大熊猫:通过列的值提取某些行作为数据框 - pandas: extract certain rows as a dataframe by the value of a column 根据具有列值的行数拆分数据框 - Split dataframe based on number of rows with a column value 如何获取具有唯一列值的行数(按其他列值分组)? - How to get number of rows with a unique column value (grouped by an other column value)? 如何仅合并 pandas dataframe 中某列的行中没有值的行 - How to merge only on rows where there is no value in the rows of a certain column in pandas dataframe 从数据框中获取列中唯一值的最后一行 - Pandas - Get the last rows for a unique value in a column from a dataframe - Pandas 对于 Pandas DataFrame 列中的每个唯一值,如何随机选择一定比例的行? - For each unique value in a pandas DataFrame column, how can I randomly select a proportion of rows?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM