Let's say I have the Pandas dataframe with columns of different measurement attributes and corresponding measurement values.
ID Parameter Value
0 'A' 4.3
1 'B' 3.1
2 'C' 8.9
3 'A' 2.1
4 'A' 3.9
. . .
. . .
. . .
100 'B' 3.8
How can I filter this dataframe to only have measurements that appear more than X number of times? For example, for this dataframe I want to get all rows with more than 5 measurements (lets say only parameters 'A' and 'B' appear more than 5 times) to get a dataframe like below.
ID Parameter Value
0 'A' 4.3
1 'B' 3.1
3 'A' 2.1
. . .
. . .
. . .
100 'B' 3.8
You can use value_counts
+ isin
-
v = df.Parameter.value_counts()
df[df.Parameter.isin(v.index[v.gt(5)])]
For example, where K = 2
(get all items which have more than 2 readings) -
df
ID Parameter Value
0 0 A 4.3
1 1 B 3.1
2 2 C 8.9
3 3 A 2.1
4 4 A 3.9
5 5 B 4.5
v = df.Parameter.value_counts()
v
A 3
B 2
C 1
Name: Parameter, dtype: int64
df[df.Parameter.isin(v.index[v.gt(2)])]
ID Parameter Value
0 0 A 4.3
3 3 A 2.1
4 4 A 3.9
使用带有boolean indexing
transform
+ size
:
df[df.groupby('Parameter')['Parameter'].transform('size') > 5]
通过使用filter
df.groupby('Parameter').filter(lambda x : x['Parameter'].shape[0]>=5)
带计数的Loc也可以工作
df.loc[df.Parameter.isin((df.groupby('Parameter').size().Value >= 5).index)]
You can use value_counts()
to get the rows in a DataFrame with their original indexes where the values in for a particular column appear more than once with Series
manipulation
more_than_1 = DF['col1'].value_counts()
more_than_1 = list(more_than_1[more_than_1>1].index)
more_than_1_rows = DF[DF['col1'].isin(more_than_1)
more_than_1_rows
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.