简体   繁体   English

将带有时间戳的数据分离到bin中

[英]Separating timestamped data into bins

I have a text file containing timestamps with associated button presses. 我有一个文本文件,其中包含带有相关按钮按下的时间戳。 I loaded it into R using R studio. 我使用R studio将其加载到R中。 The button presses are formatted as strings. 按钮按下格式为字符串。

52 right 08:16:23

53     a 08:16:23

54    up 08:16:24

55     a 08:16:24

56     b 08:16:24

57     a 08:16:24

58     a 08:16:24

59 right 08:16:24

60     a 08:16:24

The timestamps have been converted into POSIXct timestamps, but came in separate date and time fields in my text file. 时间戳已转换为POSIXct时间戳,但出现在我的文本文件中单独的日期和时间字段中。

I want to break the data into equally spaced bins based on time and count the frequency of each button within these. 我想根据时间将数据分成等间隔的bin,并计算其中每个按钮的频率。

There are a handful of buttons and there are a lot of different nonunique timestamps. 有几个按钮,并且有很多不同的非唯一时间戳。

Ideally I'd like as small as minute intervals and a solution that allows me to change the granularity would be great. 理想情况下,我希望间隔时间尽可能短,而允许我更改粒度的解决方案将是一个不错的选择。

These functions may be of interest to you: 这些功能可能是您感兴趣的:

The answer depends if the time is recognized by R. If not, you can use 答案取决于时间是否被R识别。如果不能,则可以使用

chron( ... ) 

on your time variable. 在您的时间变量上。 Please see: http://www.stat.berkeley.edu/~s133/dates.html 请参阅: http : //www.stat.berkeley.edu/~s133/dates.html

c <- cut(time_variable, number_of_bins)

This should get the max and min of the time variable, divide the range by the number of bins, then assign each of the times to the appropriate bin 这应该获得时间变量的最大值和最小值,将范围除以bin数,然后将每个时间分配给适当的bin

table(c)

This will return the frequency in each bin 这将返回每个仓中的频率

Let's assume you have a data.frame named "dat" and that the time value is in a column named "V3", as it is in the one I created form you text. 假设您有一个名为“ dat”的data.frame,并且时间值位于一列名为“ V3”的列中,就像我创建的文本框一样。 Then using seq.POSIXct with an interval of a minute only creates a single point and cut cannot handle that so I started adding different values. 然后以一分钟的间隔使用seq.POSIXct仅创建一个点,而切割无法处理该点,因此我开始添加不同的值。 In the process I discovered that my initial attempt with seq.POSIXct returned NA for the upper values because the sequence ended if the seconds were higher in the max time than the min time so I added 60 seconds to the max. 在此过程中,我发现我最初对seq.POSIXct的尝试返回了NA作为上限值,因为如果最大时间中的秒数大于最小时间,则序列结束,因此我将最大值增加了60秒。 as the interval for this demonstration. 作为此演示的时间间隔。 You should be able to generalize the code in the obvious locations. 您应该能够在明显的位置推广代码。

# Initial failed attempt with your data
> grp <- cut(dat$time, breaks=seq(min(dat$time), max(dat$time), by="1 min"), include.lowest=TRUE) 
Error in cut.default(unclass(x), unclass(breaks), labels = labels, right = right,  : 
  'breaks' are not unique

 # Better data, more challenging, allows better testing

dat$grp <- cut(dat$time, breaks=seq(min(dat$time), 
                                      max(dat$time)+60, by="1 min"), 
                           include.lowest=TRUE,right=TRUE)

> dat
  V1    V2       V3                time                 grp
1 52 right 08:16:23 2016-04-17 08:16:23 2016-04-17 08:15:24
2 53     a 08:16:23 2016-04-17 08:16:23 2016-04-17 08:15:24
3 54    up 08:17:59 2016-04-17 08:17:59 2016-04-17 08:17:24
4 55     a 08:18:45 2016-04-17 08:18:45 2016-04-17 08:18:24
5 56     b 08:20:53 2016-04-17 08:20:53 2016-04-17 08:20:24
6 57     a 08:20:01 2016-04-17 08:20:01 2016-04-17 08:19:24
7 58     a  08:17:5 2016-04-17 08:17:05 2016-04-17 08:16:24
8 59 right 08:18:24 2016-04-17 08:18:24 2016-04-17 08:17:24
9 60     a 08:14:24 2016-04-17 08:14:24 2016-04-17 08:14:24

You can get the counts by group with table: 您可以使用表格按组获取计数:

> table(dat$grp)

2016-04-17 08:14:24 2016-04-17 08:15:24 2016-04-17 08:16:24 2016-04-17 08:17:24 
                  1                   2                   1                   2 
2016-04-17 08:18:24 2016-04-17 08:19:24 2016-04-17 08:20:24 
                  1                   1                   1 

See ?table for additional options about handling missing values. 有关处理缺失值的其他选项,请参见?table

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM