简体   繁体   中英

Assign a value to a df$column from another df?

Example: I have a df in which the first column is

dat <- c("A","B","C","A")

and then I have another df in which I have in the first column is:

dat2[, 1]
[1] A B C
Levels: A B C

dat2[, 2]
[1] 21000 23400 26800

How can I add the values in the second df ( dat2 ) to the first df ( dat )? In the first df there are repetitions and I want that everytime there is an "A" it will add the corresponding value (21000) from the second df in a new column.

Generating reproducible dataframe...

dat1 <- data.frame(x1 = c("A","B","C","A"), stringsAsFactors = FALSE)
dat2 <- data.frame(x1 = c("A","B","C"),
                   x2 = c(21000, 23400, 26800), stringsAsFactors = FALSE)

Then use the match function.

dat1$dat2_vals <- dat2$x2[match(dat1$x1, dat2$x1)]

It is important to transform your character columns to character type rather than factor type or the elements will not match. I mention this due to the levels attribute in your dat2.

A third option which I prefer is left_join from dplyr ... It seems to be faster than merge with large data frames.

require(dplyr)

dat1 <- data.frame(x1 = c("A","B","C","A"), stringsAsFactors = FALSE)
dat2 <- data.frame(x1 = c("A","B","C"),
                   x2 = c(21000, 23400, 26800), stringsAsFactors = FALSE)

dat1 <- left_join(dat1, dat2, by="x1")

Let's race large dataframes with microbenchmark , just for fun!

create large dataframes

dat1 <- data.frame(x1 = rep(c("A","B","C","A"), 1000), stringsAsFactors = FALSE)
dat2 <- data.frame(x1 = rep(c("A","B","C", "D"), 1000),
                   x2 = runif(1,0), stringsAsFactors = FALSE)

on your marks, get set, GO!

library(microbenchmark)
mbm <- microbenchmark(
  left_join = left_join(dat1, dat2, by="x1"),
  merge = merge(dat1, dat2, by = "x1"),
  times = 20
)

Many, many seconds later.... left_join is MUCH faster for large dataframes.

在此输入图像描述

Use merge function.

# Input data
dat  <- data.frame(ID = c("A", "B", "C", "A"))
dat2 <- data.frame(ID = c("A", "B", "C"), 
                   value = c(1, 2, 3))
# Merge two data.frames by specified column
merge(dat, dat2, by = "ID")
  ID value
1  A     1
2  A     1
3  B     2
4  C     3

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM