繁体   English   中英

根据数据框中的多个列向r数据框中添加新列

[英]Adding new columns to r dataframe based on multiple columns within the dataframe

我有与个体识别的鲸鱼的旅游互动数据,其中有鲸鱼ID,相遇日期和相遇时间

Id    Date     Time  
A   20110527    10:42
A   20110527    11:24
A   20110527    11:52
A   20110603    10:29
A   20110603    10:59
B   20110503    11:23
B   20110503    11:45
B   20110503    12:05
B   20110503    12:17

我现在想在其他列中添加一些标签,分别为每个人的每次相遇日期和当天的相遇次数分别标记如下:

Id     Date     Time  Day   Encounter
A   20110527    10:42   1   1
A   20110527    11:24   1   2
A   20110527    11:52   1   3
A   20110603    10:29   2   1
A   20110603    10:59   2   2
B   20110503    11:23   1   1
B   20110503    11:45   1   2
B   20110503    12:05   1   3
B   20110503    12:17   1   4

这可能吗? 任何帮助将不胜感激!

我们可以使用data.table 将“ data.frame”转换为“ data.table”( setDT(df1) ),按“ Id”分组,我们将“ Date”与“ Date”的uniquematch以创建“ Day”列。 然后,我们将'Id','Date'分组,然后将( := )行的顺序分配给“ Encounter”。

library(data.table)
setDT(df1)[, Day:= match(Date, unique(Date)), by = Id
         ][, Encounter := seq_len(.N), by = .(Id, Date)]
df1
#    Id     Date  Time Day Encounter
#1:  A 20110527 10:42   1         1
#2:  A 20110527 11:24   1         2
#3:  A 20110527 11:52   1         3
#4:  A 20110603 10:29   2         1
#5:  A 20110603 10:59   2         2
#6:  B 20110503 11:23   1         1
#7:  B 20110503 11:45   1         2
#8:  B 20110503 12:05   1         3
#9:  B 20110503 12:17   1         4

数据

df1 <- structure(list(Id = c("A", "A", "A", "A", "A", 
 "B", "B", "B", 
"B"), Date = c(20110527L, 20110527L, 20110527L, 
 20110603L, 20110603L, 
 20110503L, 20110503L, 20110503L, 20110503L), 
 Time = c("10:42", 
 "11:24", "11:52", "10:29", "10:59", "11:23", "11:45", "12:05", 
 "12:17")), .Names = c("Id", "Date", "Time"),
  class = "data.frame", row.names = c(NA, -9L))

这是一个可重现的示例:

df <- structure(list(
  Id = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L),
                 .Label = c("A", "B"), class = "factor"),
  Date = c(20110527L, 20110527L, 20110527L, 20110603L,
           20110603L, 20110503L, 20110503L, 
           20110503L, 20110503L),
  Time = structure(c(2L, 5L, 7L, 1L, 3L, 4L, 6L, 8L, 9L),
                   .Label = c("10:29", "10:42", "10:59", "11:23", "11:24", "11:45", "11:52", "12:05", "12:17"), class = "factor")),
  .Names = c("Id",  "Date", "Time"), class = "data.frame", row.names = c(NA, -9L))

然后可以使用dplyr

library(dplyr)
group_by(df, Id, Date) %>% mutate(Encounter=1:n()) %>% ungroup()

Source: local data frame [9 x 4]

Id     Date   Time Encounter
(fctr)    (int) (fctr)     (int)
1      A 20110527  10:42         1
2      A 20110527  11:24         2
3      A 20110527  11:52         3
4      A 20110603  10:29         1
5      A 20110603  10:59         2
6      B 20110503  11:23         1
7      B 20110503  11:45         2
8      B 20110503  12:05         3
9      B 20110503  12:17         4

或使用aveby Base R:

我使用了Vincent Bonhomme发布的数据(数据应按日期和ID排序):

# Function to count the days per individual using factor levels 
foo <- function(x){as.numeric(as.character(factor(x,labels = 1:nlevels(factor(x)))))}

# Add the columns Day & Encounter
df$Day <-unlist(by(df$Date,list(df$Id),FUN=foo))
df$Encounter <- ave(1:nrow(df),list(df$Id,df$Date),FUN=seq_along)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM