简体   繁体   English

如何在R中组合两个不同长度的向量

[英]How do I combine two vectors of different length in R

I have a set of measurements done regularly, but some are missing: 我定期进行一组测量,但有些测量结果丢失了:

      measurement_date value
1  2011-01-17 13:00:00     5
2  2011-01-17 13:04:00     5
3  2011-01-17 13:08:00     7
4  2011-01-17 13:12:00     8
5  2011-01-17 13:16:00     4
6  2011-01-17 13:24:00     6
7  2011-01-17 13:28:00     5
8  2011-01-17 13:32:00     6
9  2011-01-17 13:36:00     9
10 2011-01-17 13:40:00     8
11 2011-01-17 13:44:00     6
12 2011-01-17 13:48:00     6
13 2011-01-17 13:52:00     4
14 2011-01-17 13:56:00     6

I have a function that's going to process the values and can handle missing values, but the row has to be there so I'm generating an array that has a row for every minute like this: 我有一个函数,它将处理值并可以处理缺失值,但行必须在那里,所以我生成一个每分钟有一行的数组,如下所示:

times <- timeSequence(from=.., length=60, by="min")

Now I have a row for each minute of the hour but I need to merge the data. 现在我每小时都有一行,但我需要合并数据。 I tried something like this but couldn't quite get it right: 我试过这样的事情,但不能完全正确:

lapply(times, function(time) {
    n <- as.numeric(time)
    v <- Position(function(candidate) {
        y <- as.numeric(candiated)
        n == y
    }

    .. insert the value into the row here ..
}

but I'm only getting errors and warnings. 但我只是得到错误和警告。 Am I going around the problem the right way? 我是否以正确的方式解决问题? I really want a "complete" array with values per minute as there will be many different functions that will be run of the readings and it just makes it easier to implement them if they can assume that it's all there. 我真的想要一个具有每分钟值的“完整”数组,因为将有许多不同的函数将运行读数,如果它们可以假设它就在那里,它就更容易实现它们。

DF <- data.frame(measurement_date = seq(as.POSIXct("2011-01-17 13:00:00"),
                                        as.POSIXct("2011-01-17 13:56:00"),
                                        by = "mins")[seq(1, 57, by = 4)][-6],
                 value = c(5,5,7,8,4,6,5,6,9,8,6,6,4,6))
full <- data.frame(measurement_date = seq(as.POSIXct("2011-01-17 13:00:00"),
                                          by = "mins", length = 60),
                   value = rep(NA, 60))

Two approaches can be used, the first via merge : 可以使用两种方法,第一种是merge

> v1 <- merge(full, DF, by.x = 1, by.y = 1, all = TRUE)[, c(1,3)]
> names(v1)[2] <- "value" ## I only reset this to pass all.equal later
> head(v1)
     measurement_date value
1 2011-01-17 13:00:00     5
2 2011-01-17 13:01:00    NA
3 2011-01-17 13:02:00    NA
4 2011-01-17 13:03:00    NA
5 2011-01-17 13:04:00     5
6 2011-01-17 13:05:00    NA

The second is via an indicator variable derived using %in% : 第二个是通过使用%in%派生的指标变量:

> want <- full$measurement_date %in% DF$measurement_date
> full[want, "value"] <- DF[, "value"]
> head(full)
     measurement_date value
1 2011-01-17 13:00:00     5
2 2011-01-17 13:01:00    NA
3 2011-01-17 13:02:00    NA
4 2011-01-17 13:03:00    NA
5 2011-01-17 13:04:00     5
6 2011-01-17 13:05:00    NA
> all.equal(v1, full)
[1] TRUE

The merge version is strongly preferred, but needs a little work. 合并版本是强烈的首选,但需要一点点的工作。 The %in% solution only works here because the data are in time order in both DF and full , hence my earlier "preferred". %in%解决方案仅适用于此处,因为数据在DFfull中都按时间顺序排列,因此我之前的“首选”。 It is easy to get/ensure the two objects in time order however, so both approaches require a little finesse-ing to work. 然而,很容易按时间顺序获得/确保这两个对象,因此这两种方法都需要一些精细的工作。 We can modify the %in% approach to get both variables in order (starting afresh with full ): 我们可以修改%in%方法以按顺序获取两个变量(重新开始full ):

full2 <- data.frame(measurement_date = seq(as.POSIXct("2011-01-17 13:00:00"),
                                           by = "mins", length = 60),
                    value = rep(NA, 60))
full2 <- full2[order(full2[,1]), ] ## get full2 in order
DF2 <- DF[order(DF[,1]), ]         ## get DF in order
want <- full$measurement_date %in% DF$measurement_date
full2[want, "value"] <- DF2[, "value"]

>     all.equal(full, full2)
[1] TRUE
>     all.equal(full2, v1)
[1] TRUE
>

In your function, as.numeric(candiated) should be as.numeric(candidate). 在你的函数中,as.numeric(candiated)应该是as.numeric(候选者)。 There's also a bracket missing. 还有一个支架丢失。 I have no clue what exactly you're trying to achieve in your function, but it looks horrendously complex to me. 我不知道你在你的功能中究竟想要实现什么,但它对我来说看起来非常复杂。

Try 尝试

merge(Data,times,by.x=1,by.y=1,all.y=T)

This should give you something to work with. 这应该给你一些工作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM