R：根据第二个data.frame中的值在data.frame中创建一个新列

Question

I have two (example) data.frames (df1, df2) 我有两个（示例）data.frames（df1，df2）

#df1
L <- LETTERS[1:4]
b <- sample(L, 20, replace = TRUE)
df1 <- data.frame(stuff1 = 1, stuff2 = 1:10, b = b, c= NA, stringsAsFactors=FALSE)

#df2
a <- c(10,20,30,40)
df2 <- data.frame(xx = L, yy = a, stringsAsFactors=FALSE )

i want to have a new column, let's say c , in df1 based on the values out of df2 . 我想基于df2的值在df1有一个新列，例如c 。 One example: A has the corresponding value of 10 (see df2 ) so for every A in column b of df1 should be 10 written down in the (new) line c . 一个示例： A具有对应的值10（请参见df2 ），因此对于df1 b列中的每个A，在（新）行c应记下10。 And this for every row of xx in df2 , so in this case A,B,C and D. My code is not really working and is based only for a single value, here A : 而且这是df2中xx每一行，因此在这种情况下是A，B，C和D。我的代码不是真正起作用，并且仅基于单个值，这里A ：

##copy column b now it is c
df1["c"] <- df1$b


# which value has A in df2?
zz <- df2[df2$xx == "A",]
xy <- zz$yy


# fill in the new value in c (this is not working)
df1[df1$c == "A", ] <- xy

i hope it is clear what i want to say... oh and i have some big data this is only an example to try out... 我希望我想说的很清楚...哦，我有一些大数据，这只是一个尝试的例子...

Answer 1

It sounds like you just want to do a merge/join. 听起来您只想进行合并/联接。 First, let's drop the empty c in df1 and change the column names a bit: 首先，让我们将空c放入df1并稍微更改列名称：

 df1 <- df1[, !names(df1) %in% "c"]
 colnames(df2) <- c("b", "c")

With just base R, we can use merge : 仅使用base R，我们可以使用merge ：

 df3 <- merge(df1, df2, by="b", type="left")
 head(df3)

  b stuff1 stuff2  c
1 A      1      1 10
2 A      1      2 10
3 A      1      3 10
4 A      1      3 10
5 A      1     10 10
6 A      1      7 10

The package plyr has an alternative that might be faster and will preserve column order: 软件包plyr可能有一个更快的选择，它将保留列顺序：

library(plyr)
df4 <- join(df1, df2, by="b", type="left")
head(df4)

  stuff1 stuff2 b  c
1      1      1 A 10
2      1      2 A 10
3      1      3 A 10
4      1      4 B 20
5      1      5 B 20
6      1      6 B 20

I don't know how well that will scale with the size of your data, but if it doesn't, you could try data.table or sqldf . 我不知道它会随数据大小扩展的程度如何，但是如果没有，您可以尝试data.table或sqldf 。 I haven't used these two so I can't speak much to them, but here's a comparison of their speed that might be a good starting point. 我没有用过这两个，所以我不能对他们说太多，但这是对它们速度的比较，可能是一个很好的起点。

R：根据第二个data.frame中的值在data.frame中创建一个新列

问题描述

1 个解决方案

解决方案1
1 已采纳 2014-04-29 15:47:01

R：根据第二个data.frame中的值在d​​ata.frame中创建一个新列

问题描述

1 个解决方案

解决方案1 1 已采纳 2014-04-29 15:47:01

R：根据第二个data.frame中的值在data.frame中创建一个新列

解决方案1
1 已采纳 2014-04-29 15:47:01