通过匹配一部分行来分割数据帧

Question

I would like to select only the predominant part of a DF. 我只选择DF的主要部分。 For example, given 例如，给定

id_B, supportProgress
id1, A
id1, A 
id1, A
id1, A
id1, A
id1, B
id1, B

Output is: 输出为：

id_B, supportProgress
id1, A
id1, A 
id1, A
id1, A
id1, A

I cannot apply a simple filter as I don't know what the values of supportProgress are. 我无法应用简单的过滤器，因为我不知道supportProgress的值是什么。 In another DF could be supportProgress = C,C,C,C,C,D,D and, I want only select the part corresponding to C,C,C,C,C. 在另一个DF中可以是supportProgress = C，C，C，C，C，D，D，并且我只想选择与C，C，C，C，C对应的部分。

My idea is to do a df.groupby(['supportProgress']) and select the portion that covers more than 80% of the len(df) . 我的想法是执行df.groupby(['supportProgress'])并选择覆盖len(df) 80％以上的部分。

Answer 1

我不知道这80％，但是要获取最频繁的支持supportProgress数据，您可以使用以下方法：

df[df['supportProgress'] == df['supportProgress'].value_counts().index[0]]

Answer 2

You need value_counts first: 您首先需要value_counts ：

a = df['supportProgress'].value_counts(normalize=True)
print (a)
A    0.714286
B    0.285714
Name: supportProgress, dtype: float64

#get all values by conditions
b = a.index[a > .8]
#if return no value, get all values
b = a.index if b.empty else b
print (b)
Index(['A', 'B'], dtype='object')

#last filter
df = df[df['supportProgress'].isin(b)]
print (df)
  id_B supportProgress
0  id1               A
1  id1               A
2  id1               A
3  id1               A
4  id1               A
5  id1               B
6  id1               B

通过匹配一部分行来分割数据帧

问题描述

2 个解决方案

解决方案1
1 2017-08-30 10:40:51

解决方案2
1 已采纳 2017-08-30 10:41:43

通过匹配一部分行来分割数据帧

问题描述

2 个解决方案

解决方案1 1 2017-08-30 10:40:51

解决方案2 1 已采纳 2017-08-30 10:41:43

解决方案1
1 2017-08-30 10:40:51

解决方案2
1 已采纳 2017-08-30 10:41:43