简体   繁体   English

R遍历一个因子的级别以为每个级别创建一个数字序列

[英]R loop over levels of a factor to create a sequence of numbers for each level

I'm working on a dataframe with GPS data from beavers, the dataframe includes on column with the animals id (see $id below) which is a factor with 26 levels. 我正在使用海狸的GPS数据处理数据框,该数据框的动物ID(请参见下面的$id )包含在列中,这是26级的因子。 For each beaver, we have several GPS values - the number differs from animal to animal. 对于每个海狸,我们有几个GPS值-每个动物的数字都不相同。

I now want to create a separate column with "Time after capture" per individual in 15 min intervalls, starting at 0 min. 我现在想创建一个单独的列,该列的间隔为15分钟(从0分钟开始),每个人的“捕获后时间”。 For the 15 min intervall I tried to create a sequence 在15分钟的间隔中,我尝试创建一个序列

TimePostRel <- seq(from = 0, along = x, by = 15)

Now I'm not sure how to define x so it refers to each individual. 现在,我不确定如何定义x,因此它指向每个个体。 Should I use the split function to split up the dataframe? 我应该使用split函数拆分数据帧吗? We do have a date/time column too, but the problem is that we have no GPS points during daytime (when the animals are sleeping), resulting in breaks that we want to exclude from the TimePostRel calculations (we just want to refer to "active time" after capture). 我们也有一个“日期/时间”列,但是问题是白天(动物在睡觉时)没有GPS点,因此我们希望将其排除在TimePostRel计算之外(我们只想引用“活动时间”)。

This is the dataframe: 这是数据框:

'data.frame':   6425 obs. of  22 variables:
 $ nb              : int  1 2 3 4 5 6 7 8 9 10 ...
 $ x               : num  517710 517680 NA 517625 517624 ...
 $ y               : num  6587730 6587759 NA 6587929 6588014 ...
 $ date            : POSIXct, format: "2010-04-10 05:15:00" "2010-04-10 05:30:00" "2010-04-10         05:45:00" "2010-04-10 06:00:00" ...
 $ dx              : num  -30.2 NA NA -0.4 -39.2 ...
 $ dy              : num  28.8 NA NA 85.7 126.8 ...
 $ dist            : num  41.7 NA NA 85.7 132.7 ...
 $ dt              : num  900 900 900 900 900 900 900 900 NA 900 ...
 $ R2n             : num  0 1743 NA 46880 88416 ...
 $ abs.angle       : num  2.38 NA NA 1.58 1.87 ...
 $ rel.angle       : num  NA NA NA NA 0.295 ...
 $ id              : Factor w/ 26 levels "Andreas","Apple",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ burst           : Factor w/ 329 levels "Andreas.1","Andreas.2",..: 1 1 1 1 1 1 1 1 1 2 ...
 $ sex             : int  2 2 NA 2 2 2 NA 2 2 2 ...
 $ season          : int  2 2 NA 2 2 2 NA 2 2 2 ...
 $ try             : int  33 34 NA 36 37 38 NA 39 40 41 ...
 $ x.sats          : int  5 5 NA 5 5 5 NA 6 5 6 ...
 $ hdop            : num  2.1 4.2 NA 2.7 3.3 2.1 NA 2.5 2.8 2.2 ...
 $ lodge.x         : num  517595 517595 NA 517595 517595 ...
 $ lodge.y         : num  6587806 6587806 NA 6587806 6587806 ...
 $ NSD_lodge       : num  19039 9440 NA 15909 44268 ...
 $ nsd_1stGPSpoint : num  0 1743 NA 46880 88416 ...

Somebody nows how to solve this? 现在有人如何解决这个问题? Thanks in advance!! 提前致谢!!

Cheers, Patricia 干杯,帕特里夏

You can do this very quickly in data.table . 您可以在data.table非常快速地执行此data.table I assume your data is called dta : 我假设您的数据称为dta

library(data.table)
setDT(dta)   ## change format
dta[, TimePostRel:=seq(from = 0, along = x, by = 15), by=x]

The plyr package can also accomplish this task. plyr软件包也可以完成此任务。 For a data frame that has a column of factors, use the transform option of ddply : 对于具有一列因子的数据帧,请使用ddply的transform选项:

library(plyr)
# create a data frame where column x is a factor
df <- data.frame(x=c(rep("b",6),rep("a",3),rep("c",4)))
# apply sequence to each level within x
df <- ddply(df,"x",transform,t=seq(from=0,by=15,length.out=length(x)))

Note that the rows of the new data frame are ordered to match the factor levels of column x: 请注意,新数据框的行被排序为与列x的因子水平匹配:

print(df)
   x  t
1  a  0
2  a 15
3  a 30
4  a 45
5  a 60
6  a 75
7  b  0
8  b 15
9  b 30
10 c  0
11 c 15
12 c 30
13 c 45

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM