R如何遍历data.frame中的所有字符变量并更改特定值

Question

I'm trying to loop over all the variables in a data.table and modify all the character variables; 我试图遍历data.table中的所有变量并修改所有字符变量； some of the values of these character variables are 'NULL', and I want to change them to ''. 这些字符变量的某些值为“ NULL”，我想将其更改为“”。

For example: I want to change 例如：我要更改

    library(data.table)
    df <- data.table('id' = seq(1:10),
             'datadate' = seq(as.Date('2015-01-01'), as.Date('2015-01-10'), by="days"),
             'charvar' = c('a', 'b', 'c', rep('NULL', 7)),
             'charvar1' = c('a', 'b', 'c', 'd', rep('NULL', 6)))



    id   datadate charvar charvar1
 1:  1 2015-01-01       a        a
 2:  2 2015-01-02       b        b
 3:  3 2015-01-03       c        c
 4:  4 2015-01-04    NULL        d
 5:  5 2015-01-05    NULL     NULL
 6:  6 2015-01-06    NULL     NULL
 7:  7 2015-01-07    NULL     NULL
 8:  8 2015-01-08    NULL     NULL
 9:  9 2015-01-09    NULL     NULL
10: 10 2015-01-10    NULL     NULL

into 成

    id   datadate charvar charvar1
 1:  1 2015-01-01       a        a
 2:  2 2015-01-02       b        b
 3:  3 2015-01-03       c        c
 4:  4 2015-01-04                d
 5:  5 2015-01-05                 
 6:  6 2015-01-06                 
 7:  7 2015-01-07                 
 8:  8 2015-01-08                 
 9:  9 2015-01-09                 
10: 10 2015-01-10

I tried two ways: 我尝试了两种方法：

First method: 第一种方法：

    df %>%
      mutate_if(is.character(.)==TRUE, 
                funs(function(col){col = if_else(col=='NULL', '', col)}))

from which I got the error: 从我得到的错误：

          Error: length(.p) == length(vars) is not TRUE

Second method: 第二种方法：

data.frame(
  lapply(df, function(col)
              {if(is.character(col)==TRUE) col = ifelse(col=='NULL', '', col)})
)

For which I got the error 为此我得到了错误

Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE,  : 
  arguments imply differing number of rows: 0, 10

What am I doing wrong here? 我在这里做错了什么？ Would appreciate insights into how to correct both methods and why the code above is wrong. 希望了解如何纠正这两种方法以及上面的代码错误的原因。

Answer 1

Since df is a data.table , you can modify specific rows by supplying [.data.table with a logical vector in i and assigning the new value in j eg df[charvar == 'NULL', charvar := ''] . 由于df是data.table ，因此可以通过在[.data.table提供i的逻辑向量，并在j分配新值，例如df[charvar == 'NULL', charvar := '']来修改特定行。 So you can lapply over all character columns to do that for each of them. 因此，您可以lapply所有字符列，以针对每个字符列执行此操作。 This avoids using ifelse , and so avoids reassigning the entire column each time. 这避免了使用ifelse ，因此避免了每次都重新分配整个列。

library(data.table)

lapply(names(df)[sapply(df, is.character)], #lapply over all character column names
       function(x) df[df[[x]] == 'NULL', (x) := '']) #set column equal to '' for rows where it equals 'NULL'

If you want to use dplyr , you can do 如果要使用dplyr ，则可以执行

library(dplyr)

df %>%
  mutate_if(is.character, 
            function(col) if_else(col == 'NULL', '', col))

In tidyverse (to the extent that it is consistent), . 在tidyverse （的范围内，它是一致的）， . represents the left-hand side of the pipe %>% . 代表管道%>%的左侧。 So if you use is.character(.) as the first argument, dplyr will evaluate is.character(df) , which is FALSE , a logical vector of length 1 . 因此，如果将is.character(.)用作第一个参数，则dplyr将计算is.character(df) ，即FALSE ，即长度为1的逻辑向量。 But mutate_if is expecting a logical vector of length ncol(df) , or a function. 但是mutate_if期望一个长度为ncol(df)的逻辑矢量或一个函数。

Example data with > 1 character column. 包含> 1个字符列的示例数据。

df <- data.table('id' = seq(1:10),
                 'datadate' = seq(as.Date('2015-01-01'), as.Date('2015-01-10'), by="days"),
                 'charvar' = c('a', 'b', 'c', rep('NULL', 7)),
                 'charvar2' = sample(c('a', 'b', 'c', rep('NULL', 7))) )

R如何遍历data.frame中的所有字符变量并更改特定值

问题描述

1 个解决方案

解决方案1
1 已采纳 2018-07-15 20:58:22

R如何遍历data.frame中的所有字符变量并更改特定值

问题描述

1 个解决方案

解决方案1 1 已采纳 2018-07-15 20:58:22

解决方案1
1 已采纳 2018-07-15 20:58:22