[英]R: creating a new column in a data.frame based on values out of a second data.frame
I have two (example) data.frames (df1, df2) 我有两个(示例)data.frames(df1,df2)
#df1
L <- LETTERS[1:4]
b <- sample(L, 20, replace = TRUE)
df1 <- data.frame(stuff1 = 1, stuff2 = 1:10, b = b, c= NA, stringsAsFactors=FALSE)
#df2
a <- c(10,20,30,40)
df2 <- data.frame(xx = L, yy = a, stringsAsFactors=FALSE )
i want to have a new column, let's say c
, in df1
based on the values out of df2
. 我想基于
df2
的值在df1
有一个新列,例如c
。 One example: A
has the corresponding value of 10 (see df2
) so for every A in column b
of df1
should be 10 written down in the (new) line c
. 一个示例:
A
具有对应的值10(请参见df2
),因此对于df1
b
列中的每个A,在(新)行c
应记下10。 And this for every row of xx
in df2
, so in this case A,B,C and D. My code is not really working and is based only for a single value, here A
: 而且这是
df2
中xx
每一行,因此在这种情况下是A,B,C和D。我的代码不是真正起作用,并且仅基于单个值,这里A
:
##copy column b now it is c
df1["c"] <- df1$b
# which value has A in df2?
zz <- df2[df2$xx == "A",]
xy <- zz$yy
# fill in the new value in c (this is not working)
df1[df1$c == "A", ] <- xy
i hope it is clear what i want to say... oh and i have some big data this is only an example to try out... 我希望我想说的很清楚...哦,我有一些大数据,这只是一个尝试的例子...
It sounds like you just want to do a merge/join. 听起来您只想进行合并/联接。 First, let's drop the empty
c
in df1
and change the column names a bit: 首先,让我们将空
c
放入df1
并稍微更改列名称:
df1 <- df1[, !names(df1) %in% "c"]
colnames(df2) <- c("b", "c")
With just base
R, we can use merge
: 仅使用
base
R,我们可以使用merge
:
df3 <- merge(df1, df2, by="b", type="left")
head(df3)
b stuff1 stuff2 c
1 A 1 1 10
2 A 1 2 10
3 A 1 3 10
4 A 1 3 10
5 A 1 10 10
6 A 1 7 10
The package plyr
has an alternative that might be faster and will preserve column order: 软件包
plyr
可能有一个更快的选择,它将保留列顺序:
library(plyr)
df4 <- join(df1, df2, by="b", type="left")
head(df4)
stuff1 stuff2 b c
1 1 1 A 10
2 1 2 A 10
3 1 3 A 10
4 1 4 B 20
5 1 5 B 20
6 1 6 B 20
I don't know how well that will scale with the size of your data, but if it doesn't, you could try data.table
or sqldf
. 我不知道它会随数据大小扩展的程度如何,但是如果没有,您可以尝试
data.table
或sqldf
。 I haven't used these two so I can't speak much to them, but here's a comparison of their speed that might be a good starting point. 我没有用过这两个,所以我不能对他们说太多,但这是对它们速度的比较 ,可能是一个很好的起点。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.