简体   繁体   English

用来自较早观察的data.frame的数据填充data.frame

[英]Fill data.frame with data from an data.frame of an older observation

I have 2 data.frames with observations, mostly the same plots, but one is from this year and one from 2012. I'm using R studio on Windows7. 我有2个带观测值的data.frame,大多数都是相同的图,但是一个来自今年,一个来自2012年。我在Windows7上使用R studio。 What I want: Create a new column in the new data.frame with the diameter of the trees 5 years ago How I want it: R shall compare the 2 data.frames, and If the location and the tree ID mathes in both it shall copy the diameter of the 2012's data.frame in the new created column of the recent one. 我想要的是:在新的data.frame中创建一个新列,并用5年前的树的直径显示我想要的方式:R将比较2个data.frame,如果两者的位置和树ID均正确,将2012年data.frame的直径复制到最近创建的一列中。 My code so far is: 到目前为止,我的代码是:

df17$dbh12[df17$LOC=="1"] <- ifelse((df12$ID[df12$LOC=="1"]) %in% (df17$ID[df17$LOC=="1"]), df12$DBH[df12$LOC=="1"], NA)

My Problem is: R is doing it. 我的问题是:R正在这样做。 But, the two data.frames are not identical. 但是,两个data.frame是不相同的。 In 2012, some of the trees hasn't been considered because they looked sick, but now they are still alive and I measured them. 在2012年,有些树木因为看起来病了而未被考虑,但现在它们还活着,我测量了它们。 Instead, other trees are dead. 相反,其他树木已死亡。 I have 10 plots. 我有10个地块。 As example, my data and my code looks like: 例如,我的数据和代码如下所示:

df2012=data.frame(LOC=1, ID=c(1,2,4,5,6), DBH=c(7.0, 7.5, 10.25, 14.5, 6.75))
df2017=data.frame(LOC=1, ID=c(2,3,4,5,6), DBH=c(7.8, 28.7, 10.3, 13.7, 7.8))

df2017$dbh12[df2017$LOC=="1"] <- ifelse((df2012$ID[df2012$LOC=="1"]) %in% (df2017$ID[df2017$LOC=="1"]), df2012$DBH[df2012$LOC=="1"], NA)

So at the end I have 所以最后我有

> df2017
  LOC ID  DBH dbh12
    1  2  7.8    NA
    1  3 28.7  7.50
    1  4 10.3 10.25
    1  5 13.7 14.50
    1  6  7.8  6.75

My Questions: Why tree 2 has no dbh? 我的问题:为什么树2没有dbh? Why tree 3 has a dbh? 为什么树3具有dbh? Is R just copying them whatever the ID is? R是否只是复制它们而不论ID是什么? Where is my mistake? 我的错误在哪里?

We can do a join on , "LOC" and "ID" 我们可以on “ LOC”和“ ID”进行联接

library(data.table)
setDT(df2017)[df2012, dbh12 := i.DBH, on = .(LOC, ID)]
df2017
#   LOC ID  DBH dbh12
#1:   1  2  7.8  7.50
#2:   1  3 28.7    NA
#3:   1  4 10.3 10.25
#4:   1  5 13.7 14.50
#5:   1  6  7.8  6.75

In the OP' code, it is only subsetting 'ID' based on the 'LOC' value and is not match ing the 'ID' between the two datasets. 在OP'代码中,它仅基于'LOC'值子集'ID',而match两个数据集之间的'ID'不match The %in% returns a logical vector and if the order is not correct and it will assign the values based on the order of occurrence of values and not on the actual matching of 'ID' %in%返回一个逻辑向量,如果顺序不正确,它将根据值的出现顺序而不是“ ID”的实际匹配来分配值

So, here we can use match 因此,在这里我们可以使用match

i1 <- with(df2017, match(ID[LOC==1], with(df2012, ID[LOC==1])))
df2017$dbh12 <- df2012$DBH[i1]
df2017$dbh12
#[1]  7.50    NA 10.25 14.50  6.75

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM