[英]How to find value in dataframe with column and row combination of another data frame?
I want to used values of two columns in one dataframe and use these values as column * row combination in another dataframe.我想在一个 dataframe 中使用两列的值,并将这些值用作另一个 dataframe 中的列*行组合。 Sounds terrible, so I rather explain with example below.(simplified version, the actual dataset is much larger)
听起来很糟糕,所以我宁愿用下面的例子来解释。(简化版,实际数据集要大得多)
Data1
# ID Date
# 1 A 2022-02-01
# 2 B 2022-02-02
# 3 C 2022-02-03
# 4 D 2022-02-04
# 5 E 2022-02-05
# 6 F 2022-02-06
# 7 G 2022-02-07
# 8 H 2022-02-08
Data2
# ID X2022.02.01 X2022.02.02 X2022.02.03 X2022.02.04 X2022.02.05 X2022.02.06 X2022.02.07 X2022.02.08
# 1 A 1 9 17 25 33 41 49 57
# 2 B 2 10 18 26 34 42 50 58
# 3 C 3 11 19 27 35 43 51 59
# 4 D 4 12 20 28 36 44 52 60
# 5 E 5 13 21 29 37 45 53 61
# 6 F 6 14 22 30 38 46 54 62
# 7 G 7 15 23 31 39 47 55 63
# 8 H 8 16 24 32 40 48 56 64
and I would like to use ID and Date combination in Data1 to find value in Data2 So I would like to have the following outcome:我想在 Data1 中使用 ID 和 Date 组合来查找 Data2 中的值所以我想得到以下结果:
# ID Date Value
# 1 A 2022-02-01 1
# 2 B 2022-02-02 10
# 3 C 2022-02-03 19
# 4 D 2022-02-04 28
# 5 E 2022-02-05 37
# 6 F 2022-02-06 46
# 7 G 2022-02-07 55
# 8 H 2022-02-08 64
so far, I used the following code, but it took too many time as the original dataset (both Data1
and Data2
) is huge.到目前为止,我使用了以下代码,但是由于原始数据集(
Data1
和Data2
)很大,因此花费了太多时间。
for (i in 1:nrow(Data1)) {
a <- Data1[[1]][[i]]
b <- Data1[[2]][[i]]
c <- Data2[b, a]
Data1$Value[i] <- c
}
Could someone kindly help my code??有人可以帮助我的代码吗? :)
:)
Data1 <- data.frame(ID=c("A", "B", "C", "D", "E", "F", "G", "H"),
Date=c("2022-02-01", "2022-02-02", "2022-02-03", "2022-02-04",
"2022-02-05", "2022-02-06","2022-02-07", "2022-02-08"))
Data2 <- data.frame(ID=c("A", "B", "C", "D", "E", "F", "G", "H"),
"2022-02-01"=c(1:8),
'2022-02-02'=c(9:16),
'2022-02-03'=c(17:24),
'2022-02-04'=c(25:32),
'2022-02-05'=c(33:40),
'2022-02-06'=c(41:48),
'2022-02-07'=c(49:56),
'2022-02-08'=c(57:64))
Consider to match
IDs together as well as dates with column names.考虑将 ID 以及日期与列名
match
在一起。 To let the dates look like the (valid) column names we may use make.names
.为了让日期看起来像(有效的)列名,我们可以使用
make.names
。
transform(Data1, Value=mapply(\(i, j) Data2[i, j], match(Data1$ID, Data2$ID),
match(make.names(Data1$Date), names(Data2))))
# ID Date Value
# 1 A 2022-02-01 1
# 2 B 2022-02-02 10
# 3 C 2022-02-03 19
# 4 D 2022-02-04 28
# 5 E 2022-02-05 37
# 6 F 2022-02-06 46
# 7 G 2022-02-07 55
# 8 H 2022-02-08 64
Column names should not start with a number, actually they are no valid names, so R internally changes them (using make.names
).列名不应以数字开头,实际上它们不是有效名称,因此 R 会在内部更改它们(使用
make.names
)。 This is also why you need to use quotes to get those "wrong" names.这也是为什么您需要使用引号来获取那些“错误”名称的原因。 So I wonder if your
for
loop actually works.所以我想知道你的
for
循环是否真的有效。 I only brought it to work with your data like so:我只是将它用于处理您的数据,如下所示:
for (i in 1:nrow(Data1)) {
a <- Data1[[1]][[i]]
b <- make.names(Data1[[2]][[i]])
c <- Data2[Data2$ID == a, b]
Data1$Value[i] <- c
}
If the column names of your real data start with numbers, please convert them using如果您的真实数据的列名以数字开头,请使用转换它们
names(data) <- make.names(names(data))
Data:数据:
Data1 <- structure(list(ID = c("A", "B", "C", "D", "E", "F", "G", "H"),
Date = c("2022-02-01", "2022-02-02", "2022-02-03", "2022-02-04",
"2022-02-05", "2022-02-06", "2022-02-07", "2022-02-08")), class = "data.frame", row.names = c(NA,
-8L))
Data2 <- structure(list(ID = c("A", "B", "C", "D", "E", "F", "G", "H"),
X2022.02.01 = 1:8, X2022.02.02 = 9:16, X2022.02.03 = 17:24,
X2022.02.04 = 25:32, X2022.02.05 = 33:40, X2022.02.06 = 41:48,
X2022.02.07 = 49:56, X2022.02.08 = 57:64), class = "data.frame", row.names = c(NA,
-8L))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.