![](/img/trans.png)
[英]r - Replace values in a data.frame column with a different value in the same column based unique ID
[英]Replace values in a data.frame column based on values in a different column
我有這個data.frame:
df <- data.frame(id = rep(c("one", "two", "three"), each = 10), week.born = NA)
df$week.born[c(5,15,28)] <- c(23,19,24)
df
id week.born
1 one NA
2 one NA
3 one NA
4 one NA
5 one 23
6 one NA
7 one NA
8 one NA
9 one NA
10 one NA
11 two NA
12 two NA
13 two NA
14 two NA
15 two 19
16 two NA
17 two NA
18 two NA
19 two NA
20 two NA
21 three NA
22 three NA
23 three NA
24 three NA
25 three NA
26 three NA
27 three NA
28 three 24
29 three NA
30 three NA
對於one
所有week.born
值應該是23
。 week.born
two
week.born
值應該是19
。 對於one
所有week.born
值應該是24
。
什么是最好的方法呢?
我將創建另一個包含映射的data.frame,然后進行簡單的連接:
require(dplyr)
map <- data.frame(id=c("one","two","three"), new.week.born=c(23,19,24))
left_join(df, map, by="id")
# id week.born new.week.born
# 1 one NA 23
# 2 one NA 23
# ...
# 16 two NA 19
# 17 two NA 19
# 18 two NA 19
# 19 two NA 19
# 20 two NA 19
# 21 three NA 24
# 22 three NA 24
# 23 three NA 24
# ...
見下面的基准。
library(microbenchmark)
library(dplyr) # v 0.4.1
library(data.table) # v 1.9.5
df <- data.frame(id = rep(c("one", "two", "three"), each = 1e6))
df2 <- copy(df)
map <- data.frame(id=c("one","two","three"), new.week.born=c(23,19,24))
dplyr_join <- function() {
left_join(df, map, by="id")
}
r_merge <- function() {
merge(df, map, by="id")
}
data.table_join <- function() {
setkey(setDT(df2))[map]
}
Unit: milliseconds
expr min lq mean median uq max neval
dplyr_join() 409.10635 476.6690 910.6446 489.4573 705.4021 2866.151 10
r_merge() 41589.32357 47376.0741 55719.1752 50133.0918 54636.3356 83562.931 10
data.table_join() 94.14621 132.3788 483.4220 225.3309 1051.7916 1416.946 10
一個解決方案是:
df$week.born[df$id == "one"] <- 23
df$week.born[df$id == "two"] <- 19
df$week.born[df$id == "three"] <- 24
問候
你可以做:
library(data.table)
setDT(df)[,week.born:=week.born[!is.na(week.born)][1], by=id]
或使用ave
基礎R
:
df$week.born = with(df, ave(week.born, id, FUN=function(u) u[!is.na(u)][1]))
如果您只有幾個組,@ cho7tom就可以了,否則您可能更喜歡使用查找表並連接到該表以查找基於id
week.born
值。
基地R.
df <- data.frame(id = rep(c("one", "two", "three"), each = 10))
lkp <- data.frame(id=c("one","two","three"), week.born=c(23,19,24))
merge(df, lkp, by="id")
或者使用data.table
二進制連接
library(data.table)
setkey(setDT(df))[lkp]
在映射這樣的幾個組合時, plyr
包中的mapvalues
函數很簡單:
library(plyr)
df$week.born <- mapvalues(df$id, c("one", "two", "three"), c(23, 19, 24))
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.