[英]Get only data that are repeated any the given year two or more times in pandas
Below is the Raw Data.以下是原始数据。
Event Year Month
Event1 2011 January
Event1 2012 January
Event1 2013 February
Event1 2014 January
Event1 2015 March
Event2 2011 January
Event2 2014 April
Event3 2012 January
Event3 2015 March
Event4 2013 February
So only get those Event data that are occurred two or more times in given list of year ie [2011,2012,2013,2014].因此,仅获取在给定年份列表中发生两次或多次的事件数据,即 [2011,2012,2013,2014]。
So Output should be.所以输出应该是。
Event Year Month
Event1 2011 January
Event1 2012 January
Event1 2013 February
Event1 2014 January
Event1 2015 March
Event2 2011 January
Event2 2014 April
First filter rows by list in Series.isin
with boolean indexing
and then are filtered duplicated events by DataFrame.duplicated
, last filter original column Event
:首先使用
boolean indexing
在Series.isin
中按列表过滤行,然后通过DataFrame.duplicated
过滤重复事件,最后过滤原始列Event
:
L = [2011,2012,2013,2014]
df1 = df.loc[df['Year'].isin(L)]
df = df[df['Event'].isin(df1.loc[df1.duplicated(['Event']),'Event'])]
print (df)
Event Year Month
0 Event1 2011 January
1 Event1 2012 January
2 Event1 2013 February
3 Event1 2014 January
4 Event1 2015 March
5 Event2 2011 January
6 Event2 2014 April
Or you can test if greater or equal 2
is number of filtered events
:或者您可以测试是否大于或等于
2
是过滤events
的数量:
L = [2011,2012,2013,2014]
df1 = df.loc[df['Year'].isin(L)]
s = df1['Event'].value_counts()
df = df[df['Event'].isin(s.index[s.ge(2)])]
print (df)
Event Year Month
0 Event1 2011 January
1 Event1 2012 January
2 Event1 2013 February
3 Event1 2014 January
4 Event1 2015 March
5 Event2 2011 January
6 Event2 2014 April
Use isin to filter years in the list.使用 isin 过滤列表中的年份。 Groupby count and filter those greater than or equals to 2
Groupby 对大于等于 2 的计数和过滤
s=df[df['Year'].astype(str).isin(lst)]
s[s.groupby('Event')['Month'].transform('count').ge(2)]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.