简体   繁体   English

如何有效地更新R data.frame列值?

[英]How to update R data.frame column values efficiently?

Say I have this table A 说我有这张桌子A

                mpg 
RX4            21.0  
Wag            21.0  
Datsun         22.8  
Drive          21.4   
Sportabout     18.7   
Valiant        18.1  
Duste          14.3   
Merc           24.4   

Now I have a table B 现在我有一张桌子B

              mpg
RX4           60.0  
Wag           60.0  
Datsun        70.8  

What I want to do is to update the mpg value of Table A according to Table B, I can do that easily using hashmap in Java, may I know what is the efficient way of doing that in R? 我想做的是根据表B更新表A的mpg值,我可以使用Java中的哈希映射轻松地做到这一点,请问我知道在R中执行此操作的有效方法是什么?

Thanks very much indeed. 确实非常感谢。

You could use match to match the rownames of df1 (first dataset) and df2 (second) and then use it as a index to replace values of mpg in df1 with those from df2 您可以使用match来匹配df1 (第一个数据集)和df2 (第二个)的行名,然后将其用作索引,以将df1mpg值替换为df2

 indx <- match(row.names(df2), row.names(df1))
 df1$mpg[indx] <- df2$mpg[indx]

Or you could use the compact solution offered by @digEmAll 或者您可以使用@digEmAll提供的compact解决方案

 df1[row.names(df2),'mpg'] <- df2$mpg 

Update 更新资料

Using the new info about some elements in df2 are not in df1 and wants to add those rows into `df1: 使用有关df2中某些元素的新信息不在df1 ,而是想将这些行添加到`df1中:

 indx <- match(row.names(df2), row.names(df1))
 indx1 <- indx[!is.na(indx)]

 indx2 <- match(row.names(df1), row.names(df2))
 indx22 <- indx2[!is.na(indx2)]

 df1$mpg[indx1] <- df2$mpg[indx22]
 df1N <- rbind(df1,df2[setdiff(rownames(df2), rownames(df1)),,drop=FALSE])
 df1N
 #           mpg
 #RX4        60.0
 #Wag        60.0
 #Datsun     70.8
 #Drive      21.4
 #Sportabout 18.7
 #Valiant    18.1
 #Duste      14.3
 #Merc       24.4
 #Mazda      45.0
 #Mercury    42.0

Or you could use intersect and setdiff 或者您可以使用intersectsetdiff

  indxN <- intersect(row.names(df1), row.names(df2))
  df1[indxN, 'mpg']  <- df2[indxN, 'mpg']
  rbind(df1,df2[setdiff(rownames(df2), rownames(df1)),,drop=FALSE])

new data 新数据

  df1 <- structure(list(mpg = c(21, 21, 22.8, 21.4, 18.7, 18.1, 14.3, 
   24.4)), .Names = "mpg", class = "data.frame", row.names = c("RX4", 
  "Wag", "Datsun", "Drive", "Sportabout", "Valiant", "Duste", "Merc"
  ))


  df2 <- structure(list(mpg = c(45, 60, 60, 42, 70.8)), .Names = "mpg",
   class =    "data.frame", row.names = c("Mazda", "RX4", "Wag",
  "Mercury", "Datsun"))

old data 旧数据

  df1 <- structure(list(mpg = c(60, 70, 80.8, 90.4, 18.7, 18.1, 14.3, 
  24.4, 22.8, 19.2, 17.8), cyl = c(6L, 6L, 4L, 6L, 8L, 6L, 8L, 
  4L, 4L, 6L, 6L), disp = c(160, 160, 108, 258, 360, 225, 360, 
 146.7, 140.8, 167.6, 167.6), hp = c(110L, 110L, 93L, 110L, 175L, 
 105L, 245L, 62L, 95L, 123L, 123L), drat = c(3.9, 3.9, 3.85, 3.08, 
 3.15, 2.76, 3.21, 3.69, 3.92, 3.92, 3.92), wt = c(2.62, 2.875, 
 2.32, 3.215, 3.44, 3.46, 3.57, 3.19, 3.15, 3.44, 3.44), qsec = c(16.46, 
 17.02, 18.61, 19.44, 17.02, 20.22, 15.84, 20, 22.9, 18.3, 18.9
 ), vs = c(0L, 0L, 1L, 1L, 0L, 1L, 0L, 1L, 1L, 1L, 1L), am = c(1L, 
 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), gear = c(4L, 4L, 4L, 
 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L), carb = c(4L, 4L, 1L, 1L, 2L, 
 1L, 4L, 2L, 2L, 4L, 4L)), .Names = c("mpg", "cyl", "disp", "hp", 
 "drat", "wt", "qsec", "vs", "am", "gear", "carb"), row.names = c("Mazda RX4", 
 "Mazda RX4 Wag", "Datsun 710", "Hornet 4 Drive", "Hornet Sportabout", 
 "Valiant", "Duster 360", "Merc 240D", "Merc 230", "Merc 280", 
 "Merc 280C"), class = "data.frame")

 df2 <- structure(list(mpg = c(60, 70, 80.8, 90.4), cyl = c(6L, 6L, 4L, 
 6L), disp = c(160, 160, 108, 258), hp = c(110L, 110L, 93L, 110L
 ), drat = c(3.9, 3.9, 3.85, 3.08), wt = c(2.62, 2.875, 2.32, 
 3.215), qsec = c(16.46, 17.02, 18.61, 19.44), vs = c(0L, 0L, 
 1L, 1L), am = c(1L, 1L, 1L, 0L), gear = c(4L, 4L, 4L, 3L), carb = c(4L, 
 4L, 1L, 1L)), .Names = c("mpg", "cyl", "disp", "hp", "drat", 
 "wt", "qsec", "vs", "am", "gear", "carb"), class = "data.frame", row.names = 
 c("Mazda RX4","Mazda RX4 Wag", "Datsun 710", "Hornet 4 Drive"))

@akrun solution works but is heavy where you can use data.table package with a few neat lines of code: @akrun解决方案可以工作,但是在可以使用data.table包和一些简洁代码的地方很沉重:

library(data.table)
dt1 = data.table(df1, keep.rownames=TRUE)
dt2 = data.table(df2, keep.rownames=TRUE)
setkey(dt1, rn)
dt1[dt2, `:=`(mpg = i.mpg)]

Where df1 and df2 are: 其中df1和df2是:

df1 = structure(list(mpg = c(21, 21, 22.8, 21.4, 18.7, 18.1, 14.3, 24.4)), .Names = "mpg", class= "data.frame", row.names = c("RX4", "Wag", "Datsun", "Drive", "Sportabout", "Valiant", "Duste","Merc"))

df2 = structure(list(mpg = c(45, 60, 60, 42, 70.8)), .Names = "mpg",class ="data.frame", row.names = c("Mazda", "RX4", "Wag","Mercury", "Datsun"))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM