簡體   English   中英

r編程為每個值向量和數據幀列多次設置數據幀

[英]r programming subsetting a data frame multiple times for each value a vector and a data frame column

我有一個值為1:6的矢量,一個具有15分鍾倉的數據幀和一個掃描數據的數據幀。 數據幀如下所示。

垃圾桶

idMin5Bin            BinStart              BinEnd
22        22 2015-08-13 10:15:00 2015-08-13 10:19:59
23        23 2015-08-13 10:20:00 2015-08-13 10:24:59
24        24 2015-08-13 10:25:00 2015-08-13 10:29:59
25        25 2015-08-13 10:30:00 2015-08-13 10:34:59
26        26 2015-08-13 10:35:00 2015-08-13 10:39:59
27        27 2015-08-13 10:40:00 2015-08-13 10:44:59

汽車

  idTrip Link_IDLink StartCluster_id   Speed           firstScan
10     10           5              19  47.961 2015-08-13 10:11:49
11     11           5              14 118.800 2015-08-13 10:12:33
12     11           5              14 118.800 2015-08-13 10:13:16
13     12           5              22  47.793 2015-08-13 10:11:21
15     14           5              28  56.321 2015-08-13 10:13:09
24     22           5              52  45.692 2015-08-13 10:14:50

對於向量中的每個值,我想引用cars表來查找所有具有與向量值匹配的LinkIDLink值的汽車。

然后,我想通過將汽車的FirstScan與bins表的BinStartBinEnd表進行比較來子集所有匹配BinEnd

最后,我想繪制子集中的值。

我能想到的唯一策略是使用嵌套循環(我知道這是一個禁忌)。 即使使用嵌套循環,我也會從下面的示例代碼中得到以下錯誤。

for (i in 1:length(vector)){
  tempcars<-cars[cars[,2]==i,]
  for (k in 1:nrow(bins)){
    tempcars1<-subset(tempcars, firstScan<bins[k,3] & firstScan>bins[k,2])
    hist(tempcars1[,5], breaks =200)
}
}

    Error in hist.default(unclass(x), unclass(breaks), plot = FALSE, warn.unused = FALSE,  : 
  character(0) In addition: Warning messages:
1: In min(x) : no non-missing arguments to min; returning Inf
2: In max(x) : no non-missing arguments to max; returning -Inf

我當然想擺脫使用循環的麻煩,但是對循環的任何幫助都是值得的。

這是開始的答案...希望對您有所幫助...

# Generate the data
theVec <- 1:6
someTimes <- seq(as.POSIXlt(Sys.time()), by = "sec", length = 300)
bins <- data.frame(idMin5Bin = 1:20, BinStart = someTimes[1+(15*(0:19))], BinEnd = someTimes[(15*(1:20))])
cars <- data.frame(Link_IDLink = rep(theVec, each = 20), 
  firstScan = sample(someTimes, 120, replace = T), Speed = runif(120, 30, 100))


# First split by Link_IDLink
subCars <- subset(cars, Link_IDLink %in% theVec)
carList <- split(subCars, subCars$Link_IDLink)

# Now "cut" the times for each element of the list
outList <- lapply(carList, function(df, binData) {
  theBins <- c(binData$BinStart, binData$BinEnd [ nrow(binData)] )
  df$idMin5Bin <- cut(df$firstScan, theBins, labels = binData$idMin5Bin )
  df
}, binData = bins)

最終與此...

> head(outList[[1]])
  Link_IDLink           firstScan    Speed isMin5Bin
1           1 2015-09-10 22:42:33 33.85446        17
2           1 2015-09-10 22:41:06 81.43807        11
3           1 2015-09-10 22:40:53 90.59927        10
4           1 2015-09-10 22:39:38 56.89429         5
5           1 2015-09-10 22:40:20 70.44760         8
6           1 2015-09-10 22:42:08 88.93505        15

您可以通過多種方式進行繪制-如果需要幫助,請告訴我。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM