[英]Pandas calculate the number of times there is X sec difference between consecutive rows
我试图计算的每次数id, date
的datetime
为10秒钟前一行不同。
数据
id timestamp datetime date
1 1496660340 2019-06-05 10:59:00 2019-06-05
1 1496660340 2019-06-05 10:59:10 2019-06-05
1 1496660355 2019-06-05 10:59:40 2019-06-05 <- 30 sec diff from above, so not counted
1 1496655555 2019-06-06 11:58:00 2019-06-06
1 1496666666 2019-06-06 11:58:10 2019-06-06
1 1496666677 2019-06-06 11:58:20 2019-06-06
2 1496655555 2019-06-05 11:58:00 2019-06-05
2 1496666666 2019-06-05 11:58:10 2019-06-05
2 1496666677 2019-06-05 11:58:20 2019-06-05
Data columns (total 4 columns):
id int64
timestamp int64
datetime datetime64[ns]
date object
想要的
id date num_count
1 2019-06-05 1
1 2019-06-06 2
2 2019-06-05 2
我试过的
# get all the time differences first
df['timediff'] = df.groupby(['id','date'])['datetime'].diff() / np.timedelta64(1, 's')
#Count the number of 10sec differences
x = pd.DataFrame(df[df['timediff']==10].groupby(['id','date'],as_index=False)['timediff'].count())
我不确定这是否是正确的方法。 有人可以指出我正确的方向吗?
您可以在groupby
使用自定义函数:
def difference_condition(x):
return x.diff().dt.total_seconds().eq(10).sum()
res = df.groupby(['id', 'date'])['datetime'].apply(difference_condition)
print(res.reset_index(name='count'))
id date count
0 1 2019-06-05 1
1 1 2019-06-06 2
2 2 2019-06-05 2
设置
from io import StringIO
x = """id|timestamp|datetime|date
1 |1496660340 |2019-06-05 10:59:00 |2019-06-05
1 |1496660340 |2019-06-05 10:59:10 |2019-06-05
1 |1496660355 |2019-06-05 10:59:40 |2019-06-05
1 |1496655555 |2019-06-06 11:58:00 |2019-06-06
1 |1496666666 |2019-06-06 11:58:10 |2019-06-06
1 |1496666677 |2019-06-06 11:58:20 |2019-06-06
2 |1496655555 |2019-06-05 11:58:00 |2019-06-05
2 |1496666666 |2019-06-05 11:58:10 |2019-06-05
2 |1496666677 |2019-06-05 11:58:20 |2019-06-05"""
df = pd.read_csv(StringIO(x), sep='|')
df[['datetime', 'date']] = df[['datetime', 'date']].apply(pd.to_datetime)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.