簡體   English   中英

根據不同列中的值替換data.frame列中的值

[英]Replace values in a data.frame column based on values in a different column

我有這個data.frame:

df <- data.frame(id = rep(c("one", "two", "three"), each = 10), week.born = NA)
df$week.born[c(5,15,28)] <- c(23,19,24)

df 

  id week.born
1    one        NA
2    one        NA
3    one        NA
4    one        NA
5    one        23
6    one        NA
7    one        NA
8    one        NA
9    one        NA
10   one        NA
11   two        NA
12   two        NA
13   two        NA
14   two        NA
15   two        19
16   two        NA
17   two        NA
18   two        NA
19   two        NA
20   two        NA
21 three        NA
22 three        NA
23 three        NA
24 three        NA
25 three        NA
26 three        NA
27 three        NA
28 three        24
29 three        NA
30 three        NA

對於one所有week.born值應該是23 week.born two week.born值應該是19 對於one所有week.born值應該是24

什么是最好的方法呢?

我將創建另一個包含映射的data.frame,然后進行簡單的連接:

require(dplyr)
map <- data.frame(id=c("one","two","three"), new.week.born=c(23,19,24))
left_join(df, map, by="id")

# id week.born new.week.born
# 1    one        NA            23
# 2    one        NA            23
# ...
# 16   two        NA            19
# 17   two        NA            19
# 18   two        NA            19
# 19   two        NA            19
# 20   two        NA            19
# 21 three        NA            24
# 22 three        NA            24
# 23 three        NA            24
# ...

見下面的基准。

library(microbenchmark)
library(dplyr) # v 0.4.1
library(data.table) # v 1.9.5

df <- data.frame(id = rep(c("one", "two", "three"), each = 1e6))
df2 <- copy(df)
map <- data.frame(id=c("one","two","three"), new.week.born=c(23,19,24))

dplyr_join <- function() { 
  left_join(df, map, by="id")
}

r_merge <- function() {
  merge(df, map, by="id")
}

data.table_join <- function() {
  setkey(setDT(df2))[map]
}

Unit: milliseconds
              expr         min         lq       mean     median         uq       max neval
      dplyr_join()   409.10635   476.6690   910.6446   489.4573   705.4021  2866.151    10
         r_merge() 41589.32357 47376.0741 55719.1752 50133.0918 54636.3356 83562.931    10
 data.table_join()    94.14621   132.3788   483.4220   225.3309  1051.7916  1416.946    10

一個解決方案是:

df$week.born[df$id == "one"] <- 23
df$week.born[df$id == "two"] <- 19
df$week.born[df$id == "three"] <- 24

問候

你可以做:

library(data.table)
setDT(df)[,week.born:=week.born[!is.na(week.born)][1], by=id]

或使用ave基礎R

df$week.born = with(df, ave(week.born, id, FUN=function(u) u[!is.na(u)][1]))

如果您只有幾個組,@ cho7tom就可以了,否則您可能更喜歡使用查找表並連接到該表以查找基於id week.born值。

基地R.

df <- data.frame(id = rep(c("one", "two", "three"), each = 10))
lkp <- data.frame(id=c("one","two","three"), week.born=c(23,19,24))
merge(df, lkp, by="id")

或者使用data.table二進制連接

library(data.table)
setkey(setDT(df))[lkp]

在映射這樣的幾個組合時, plyr包中的mapvalues函數很簡單:

library(plyr)
df$week.born <- mapvalues(df$id, c("one", "two", "three"), c(23, 19, 24))

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM