简体   繁体   English

如何用未知的列名替换某些数据框值?

[英]How to replace certain data frame value with it's unknown column name?

I have a large data frame with unknown column names and numeric values 1, 2, 3, or 4. Now I want to replace all 4 values with it's column name and all 1, 2 and 3's with an empty value.我有一个包含未知列名和数值 1、2、3 或 4 的大型数据框。现在我想用它的列名替换所有 4 个值,用空值替换所有 1、2 和 3。

Ofcourse I can make a loop of some kind, like this:当然,我可以制作某种循环,如下所示:

df <- data.frame(id=1:8,unknownvarname1=c(1:4,1:4),unknownvarname2=c(4:1,4:1))
for (i in 2:length(df)){
  df[,i] <- as.character(df[,i])
  df[,i] <- mgsub::mgsub(df[,i],c(1,2,3,4),c("","","",names(df)[i]))  
}

This would be the result:这将是结果:

  id unknownvarname1 unknownvarname2
1  1                 unknownvarname2
2  2                                
3  3                                
4  4 unknownvarname1                
5  5                 unknownvarname2
6  6                                
7  7                                
8  8 unknownvarname1 unknownvarname2

For a data frame this size that's no problem at all.对于这样大小的数据框,这完全没有问题。 But when I try this loop on large data frames with up to 30k and up to 40 uknown variables, the loop takes ages to complete.但是,当我在具有多达 30k 和多达 40 个未知变量的大型数据帧上尝试此循环时,循环需要很长时间才能完成。

Does anyone know of a faster way to do this?有谁知道更快的方法来做到这一点? I tried functions like mutate() of dplyr package but I could not manage to make it work.我尝试了诸如dplyr packagemutate()之类的功能,但我无法使其工作。

Many thanks in advance!提前谢谢了!

One way using base R使用基础 R 的一种方法

#Replace all the values with 1:3 with blank
df[-1][sapply(df[-1], `%in%`, 1:3)] <- ""
#Get the row/column indices where value is 4
mat <- which(df == 4, arr.ind = TRUE)
#Exclude values from first column
mat <- mat[mat[, 2] != 1, ]
#Replace remaining entries with it's corresponding column names
df[mat] <- names(df)[mat[, 2]]
df

#  id unknownvarname1 unknownvarname2
#1  1                 unknownvarname2
#2  2                                
#3  3                                
#4  4 unknownvarname1                
#5  5                 unknownvarname2
#6  6                                
#7  7                                
#8  8 unknownvarname1                

Just to give another option with switch (though, as this function is not vectorized, it needs a nested sapply within a lapply which doesn't make it that "pretty" and efficient...):只是为了给switch提供另一个选项(虽然,由于这个 function 没有矢量化,它需要一个嵌套在sapply中的lapply ,这不会使它变得“漂亮”和高效......):

Basically, switch works with numeric as switch(myNumberToTest, caseIfOne, caseIfTwo, ...) .基本上, switch使用numeric作为switch(myNumberToTest, caseIfOne, caseIfTwo, ...)

So what you need is:所以你需要的是:

df[, 2:3] <- lapply(2:3, function(x) sapply(df[, x], switch, "", "", "", names(df)[x]))

df
#  id unknownvarname1 unknownvarname2
#1  1                 unknownvarname2
#2  2                                
#3  3                                
#4  4 unknownvarname1                
#5  5                 unknownvarname2
#6  6                                
#7  7                                
#8  8 unknownvarname1                

Yet another base R option, using ifelse within lapply (still looping on the columns, but vectorized approach by column):另一个基本 R 选项,在 lapply 中使用 ifelse (仍在列上循环,但按列矢量化方法):

df <- data.frame(id=1:8,unknownvarname1=c(1:4,1:4),unknownvarname2=c(4:1,4:1))
df[,2:3] <- lapply(2:3, function(x) { ifelse(df[,x] < 4, "", colnames(df)[x]) })

gives

  id unknownvarname1 unknownvarname2
1  1                 unknownvarname2
2  2                                
3  3                                
4  4 unknownvarname1                
5  5                 unknownvarname2
6  6                                
7  7                                
8  8 unknownvarname1         

Another base R possibility using sweep :使用sweep的另一个基础 R 可能性:

idx <- df[, -1] == 4
sw <- sweep(idx, 2, 1:2, FUN = '*') + 1
df[, -1] <- c("", colnames(df[, -1]))[sw]

which gives:这使:

 > df id unknownvarname1 unknownvarname2 1 1 unknownvarname2 2 2 3 3 4 4 unknownvarname1 5 5 unknownvarname2 6 6 7 7 8 8 unknownvarname1

This could be shortened to:这可以缩短为:

sw <- sweep(df[, -1] == 4, 2, 1:2, FUN = '*') + 1
df[, -1] <- c("", colnames(df[, -1]))[sw]

A somewhat inefficient tidyverse option.一个有点低效的tidyverse选项。 This is inefficient because we need to manually select the columns later:这是低效的,因为我们需要稍后手动 select 列:

to_use <- names(df)[-1]
df %>% 
  mutate_at(vars(contains("unknown")),list(~ifelse(.==4,
                                             NA,
                                             ""))) -> new_df

new_df[-1] <-map2(new_df[-1], to_use,function(x,y) replace(x,is.na(x),y))

A less manual approach that also has the disadvantage of being non specific:一种较少手动的方法,也具有不具体的缺点:

 df %>% 
   map2(.,names(.), function(x, y) ifelse( x==4, y,"")) %>% 
   as.data.frame() %>% 
   mutate(id=row.names(.)) # might be a way around  with `.id`
  id unknownvarname1 unknownvarname2
1  1                 unknownvarname2
2  2                                
3  3                                
4  4 unknownvarname1                
5  5                 unknownvarname2
6  6                                
7  7                                
8  8 unknownvarname1 

Result for approach 1:方法 1 的结果:

new_df
     id unknownvarname1 unknownvarname2
    1  1                 unknownvarname2
    2  2                                
    3  3                                
    4  4 unknownvarname1                
    5  5                 unknownvarname2
    6  6                                
    7  7                                
    8  8 unknownvarname1 

Yet another option using col to line up the names and values:另一个使用col排列名称和值的选项:

sel <- df[-1] == 4
df[-1] <- ""
df[-1][sel] <- names(df[-1])[col(df[-1])[sel]]

#  id unknownvarname1 unknownvarname2
#1  1                 unknownvarname2
#2  2                                
#3  3                                
#4  4 unknownvarname1                
#5  5                 unknownvarname2
#6  6                                
#7  7                                
#8  8 unknownvarname1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM