根据列值保留数据框行的百分比

Question

Let's suppose that I have a dataframe like that:假设我有一个这样的数据框：

import pandas as pd
df = pd.DataFrame({'id':['A','A', 'A', 'B','B'], 'value':[2, 4, 6, 3, 4]})

I want to filter this only for id = A and keep an x percentage of the rows having id = A .我只想过滤id = A并保留 x 具有id = A的行的百分比。

For example if x=60% then the dataframe should look like that:例如，如果 x=60% 那么数据框应该是这样的：

  col1  col2
0    A     2
1    A     4
2    B     3
2    B     4

How can I do this efficiently in pandas ?我怎样才能在pandas有效地做到这一点？

Just to clarify that it is not necessary that all the id =A rows are the one after each other.只是为了澄清没有必要所有id =A 行都是一个接一个。

Answer 1

One way is using iloc[] with pd.concat一种方法是将iloc[]与pd.concat一起pd.concat

x = 0.6
cond = df['id'].eq('A')
out = pd.concat((df[cond].iloc[:int(round(df['id'].eq('A').sum() * x))],
                 df[~cond]),sort=False).sort_index()

  id  value
0  A      2
1  A      4
3  B      3
4  B      4

Answer 2

You can use df.sample to achieve that easily您可以使用df.sample轻松实现

ids = ['A']
frac = 0.6
df.groupby('id', group_keys=False).apply(lambda x: x.sample(frac=frac) 
                                                   if x.name in ids else x)

Out:出去：

    id  value
1   A   4
0   A   2
3   B   3
4   B   4

根据列值保留数据框行的百分比

问题描述

2 个解决方案

解决方案1
0 2020-03-13 15:42:46

解决方案2
0 2020-03-13 16:30:14

根据列值保留数据框行的百分比

问题描述

2 个解决方案

解决方案1 0 2020-03-13 15:42:46

解决方案2 0 2020-03-13 16:30:14

解决方案1
0 2020-03-13 15:42:46

解决方案2
0 2020-03-13 16:30:14