简体   繁体   English

数据框中的索引分组列

[英]index grouped columns in data frame

I have a data frame as follow我有一个数据框如下

                     time  site   val

  2014-09-01 00:00:00  2001     1
  2014-09-01 00:15:00  2001     0
  2014-09-01 00:30:00  2001     2
  2014-09-01 00:45:00  2001     0
  2014-09-01 00:00:00  2002     1
  2014-09-01 00:15:00  2002     0
  2014-09-01 00:30:00  2002     2
  2014-09-02 00:45:00  2001     0
  2014-09-02 00:00:00  2001     1
  2014-09-02 00:15:00  2001     0
  2014-09-02 00:30:00  2001     2
  2014-09-02 00:45:00  2001     0
  2014-09-02 00:00:00  2002     1
  2014-09-02 00:15:00  2002     0
  2014-09-02 00:30:00  2002     2
  2014-09-02 00:45:00  2001     0

I'd like to be able group it by time and site then add a new variable that will consist of the occurence index of the group我希望能够按时间和站点对其进行分组,然后添加一个新变量,该变量将包含该组的出现索引

                 time  site   val   h 

  2014-09-01 00:00:00  2001     1   1
  2014-09-01 00:15:00  2001     0   2
  2014-09-01 00:30:00  2001     2   3
  2014-09-01 00:45:00  2001     0   4
  2014-09-01 00:00:00  2002     1   1
  2014-09-01 00:15:00  2002     0   2
  2014-09-01 00:30:00  2002     2   3
  2014-09-02 00:45:00  2002     0   4
  2014-09-02 00:00:00  2001     1   1
  2014-09-02 00:15:00  2001     0   2
  2014-09-02 00:30:00  2001     2   3
  2014-09-02 00:45:00  2001     0   4
  2014-09-02 00:00:00  2002     1   1
  2014-09-02 00:15:00  2002     0   2
  2014-09-02 00:30:00  2002     2   3
  2014-09-02 00:45:00  2001     0   4

df <- structure(list(time = structure(c(1409522400, 1409523300, 1409524200, 
1409525100, 1409522400, 1409523300, 1409524200, 1409611500, 1409608800, 
1409609700, 1409610600, 1409611500, 1409608800, 1409609700, 1409610600, 
1409611500), class = c("POSIXct", "POSIXt"), tzone = ""), site = structure(c(1L, 
1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L), .Label = c("2001", 
"2002"), class = "factor"), val = c(1L, 0L, 2L, 0L, 1L, 0L, 2L, 
0L, 1L, 0L, 2L, 0L, 1L, 0L, 2L, 0L)), .Names = c("time", "site", 
"val"), row.names = c(NA, -16L), class = "data.frame")

what are my possibilities in r to achieve this我在 r 中实现这一目标的可能性是什么

thanks谢谢

Using dplyr .使用dplyr First we create a column id extracting the day from the date (column time ).首先,我们创建一个列id ,从日期(列time )中提取日期。 Then we group by site and id , and add a new variable counter counting the number of occurrences by those two groups.然后我们按siteid分组,并添加一个新的变量counter计算这两组出现的次数。

df$id <- as.factor(format(df$time,'%d'))
library(dplyr)
df %>% group_by(site, id) %>% mutate(counter = row_number()) 

Output:输出:

                  time   site   val     id counter
                (time) (fctr) (int) (fctr)   (int)
1  2014-09-01 00:00:00   2001     1     01       1
2  2014-09-01 00:15:00   2001     0     01       2
3  2014-09-01 00:30:00   2001     2     01       3
4  2014-09-01 00:45:00   2001     0     01       4
5  2014-09-01 00:00:00   2002     1     01       1
6  2014-09-01 00:15:00   2002     0     01       2
7  2014-09-01 00:30:00   2002     2     01       3
8  2014-09-02 00:45:00   2001     0     02       1
9  2014-09-02 00:00:00   2001     1     02       2
10 2014-09-02 00:15:00   2001     0     02       3
11 2014-09-02 00:30:00   2001     2     02       4
12 2014-09-02 00:45:00   2001     0     02       5
13 2014-09-02 00:00:00   2002     1     02       1
14 2014-09-02 00:15:00   2002     0     02       2
15 2014-09-02 00:30:00   2002     2     02       3
16 2014-09-02 00:45:00   2001     0     02       6

We can use ave我们可以使用ave

df$h <- with(df, ave(val, cumsum(c(TRUE,diff(time)< 0)), FUN= seq_along))
df
#                  time site val h
#1  2014-09-01 03:30:00 2001   1 1
#2  2014-09-01 03:45:00 2001   0 2
#3  2014-09-01 04:00:00 2001   2 3
#4  2014-09-01 04:15:00 2001   0 4
#5  2014-09-01 03:30:00 2002   1 1
#6  2014-09-01 03:45:00 2002   0 2
#7  2014-09-01 04:00:00 2002   2 3
#8  2014-09-02 04:15:00 2001   0 4
#9  2014-09-02 03:30:00 2001   1 1
#10 2014-09-02 03:45:00 2001   0 2
#11 2014-09-02 04:00:00 2001   2 3
#12 2014-09-02 04:15:00 2001   0 4
#13 2014-09-02 03:30:00 2002   1 1
#14 2014-09-02 03:45:00 2002   0 2
#15 2014-09-02 04:00:00 2002   2 3
#16 2014-09-02 04:15:00 2001   0 4

NOTE: This is based on the expected output showed in the OP's post.注意:这是基于 OP 帖子中显示的预期输出。 I understand that 'site' is also described as the grouping variable, but then the expected output should be something else.我知道“站点”也被描述为分组变量,但预期的输出应该是别的东西。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM