[英]Assign a value to a df$column from another df?
Example: I have a df in which the first column is 示例:我有一个df,其中第一列是
dat <- c("A","B","C","A")
and then I have another df in which I have in the first column is: 然后我在第一列中有另一个df:
dat2[, 1]
[1] A B C
Levels: A B C
dat2[, 2]
[1] 21000 23400 26800
How can I add the values in the second df ( dat2
) to the first df ( dat
)? 如何将第二个df(
dat2
)中的值添加到第一个df( dat
)? In the first df there are repetitions and I want that everytime there is an "A" it will add the corresponding value (21000) from the second df in a new column. 在第一个df中有重复,我希望每次有“A”时它会在新列中添加第二个df的相应值(21000)。
Generating reproducible dataframe... 生成可重现的数据帧......
dat1 <- data.frame(x1 = c("A","B","C","A"), stringsAsFactors = FALSE)
dat2 <- data.frame(x1 = c("A","B","C"),
x2 = c(21000, 23400, 26800), stringsAsFactors = FALSE)
Then use the match
function. 然后使用
match
功能。
dat1$dat2_vals <- dat2$x2[match(dat1$x1, dat2$x1)]
It is important to transform your character columns to character
type rather than factor
type or the elements will not match. 将字符列转换为
character
类型而不是factor
类型或元素不匹配非常重要。 I mention this due to the levels
attribute in your dat2. 由于dat2中的
levels
属性,我提到了这一点。
A third option which I prefer is left_join
from dplyr
... It seems to be faster than merge
with large data frames. 我喜欢第三种选择
left_join
从dplyr
......这似乎是快于merge
大数据帧。
require(dplyr)
dat1 <- data.frame(x1 = c("A","B","C","A"), stringsAsFactors = FALSE)
dat2 <- data.frame(x1 = c("A","B","C"),
x2 = c(21000, 23400, 26800), stringsAsFactors = FALSE)
dat1 <- left_join(dat1, dat2, by="x1")
Let's race large dataframes with microbenchmark
, just for fun! 让我们用
microbenchmark
比赛大型数据帧,只是为了好玩!
create large dataframes 创建大型数据帧
dat1 <- data.frame(x1 = rep(c("A","B","C","A"), 1000), stringsAsFactors = FALSE)
dat2 <- data.frame(x1 = rep(c("A","B","C", "D"), 1000),
x2 = runif(1,0), stringsAsFactors = FALSE)
on your marks, get set, GO! 在你的标记,得到设置,GO!
library(microbenchmark)
mbm <- microbenchmark(
left_join = left_join(dat1, dat2, by="x1"),
merge = merge(dat1, dat2, by = "x1"),
times = 20
)
Many, many seconds later.... left_join is MUCH faster for large dataframes. 很多很多秒钟后.... left_join 快得多大型dataframes。
Use merge
function. 使用
merge
功能。
# Input data
dat <- data.frame(ID = c("A", "B", "C", "A"))
dat2 <- data.frame(ID = c("A", "B", "C"),
value = c(1, 2, 3))
# Merge two data.frames by specified column
merge(dat, dat2, by = "ID")
ID value
1 A 1
2 A 1
3 B 2
4 C 3
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.