简体   繁体   English

R如何遍历data.frame中的所有字符变量并更改特定值

[英]R How to loop over all character variables in a data.frame and change a specific value

I'm trying to loop over all the variables in a data.table and modify all the character variables; 我试图遍历data.table中的所有变量并修改所有字符变量; some of the values of these character variables are 'NULL', and I want to change them to ''. 这些字符变量的某些值为“ NULL”,我想将其更改为“”。

For example: I want to change 例如:我要更改

    library(data.table)
    df <- data.table('id' = seq(1:10),
             'datadate' = seq(as.Date('2015-01-01'), as.Date('2015-01-10'), by="days"),
             'charvar' = c('a', 'b', 'c', rep('NULL', 7)),
             'charvar1' = c('a', 'b', 'c', 'd', rep('NULL', 6)))



    id   datadate charvar charvar1
 1:  1 2015-01-01       a        a
 2:  2 2015-01-02       b        b
 3:  3 2015-01-03       c        c
 4:  4 2015-01-04    NULL        d
 5:  5 2015-01-05    NULL     NULL
 6:  6 2015-01-06    NULL     NULL
 7:  7 2015-01-07    NULL     NULL
 8:  8 2015-01-08    NULL     NULL
 9:  9 2015-01-09    NULL     NULL
10: 10 2015-01-10    NULL     NULL

into

    id   datadate charvar charvar1
 1:  1 2015-01-01       a        a
 2:  2 2015-01-02       b        b
 3:  3 2015-01-03       c        c
 4:  4 2015-01-04                d
 5:  5 2015-01-05                 
 6:  6 2015-01-06                 
 7:  7 2015-01-07                 
 8:  8 2015-01-08                 
 9:  9 2015-01-09                 
10: 10 2015-01-10                 

I tried two ways: 我尝试了两种方法:

First method: 第一种方法:

    df %>%
      mutate_if(is.character(.)==TRUE, 
                funs(function(col){col = if_else(col=='NULL', '', col)})) 

from which I got the error: 从我得到的错误:

          Error: length(.p) == length(vars) is not TRUE

Second method: 第二种方法:

data.frame(
  lapply(df, function(col)
              {if(is.character(col)==TRUE) col = ifelse(col=='NULL', '', col)})
)

For which I got the error 为此我得到了错误

Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE,  : 
  arguments imply differing number of rows: 0, 10

What am I doing wrong here? 我在这里做错了什么? Would appreciate insights into how to correct both methods and why the code above is wrong. 希望了解如何纠正这两种方法以及上面的代码错误的原因。

Since df is a data.table , you can modify specific rows by supplying [.data.table with a logical vector in i and assigning the new value in j eg df[charvar == 'NULL', charvar := ''] . 由于dfdata.table ,因此可以通过在[.data.table提供i的逻辑向量,并在j分配新值,例如df[charvar == 'NULL', charvar := '']来修改特定行。 So you can lapply over all character columns to do that for each of them. 因此,您可以lapply所有字符列,以针对每个字符列执行此操作。 This avoids using ifelse , and so avoids reassigning the entire column each time. 这避免了使用ifelse ,因此避免了每次都重新分配整个列。

library(data.table)

lapply(names(df)[sapply(df, is.character)], #lapply over all character column names
       function(x) df[df[[x]] == 'NULL', (x) := '']) #set column equal to '' for rows where it equals 'NULL'

If you want to use dplyr , you can do 如果要使用dplyr ,则可以执行

library(dplyr)

df %>%
  mutate_if(is.character, 
            function(col) if_else(col == 'NULL', '', col))

In tidyverse (to the extent that it is consistent), . tidyverse (的范围内,它是一致的), . represents the left-hand side of the pipe %>% . 代表管道%>%的左侧。 So if you use is.character(.) as the first argument, dplyr will evaluate is.character(df) , which is FALSE , a logical vector of length 1 . 因此,如果将is.character(.)用作第一个参数,则dplyr将计算is.character(df) ,即FALSE ,即长度为1的逻辑向量。 But mutate_if is expecting a logical vector of length ncol(df) , or a function. 但是mutate_if期望一个长度为ncol(df)的逻辑矢量或一个函数。

Example data with > 1 character column. 包含> 1个字符列的示例数据。

df <- data.table('id' = seq(1:10),
                 'datadate' = seq(as.Date('2015-01-01'), as.Date('2015-01-10'), by="days"),
                 'charvar' = c('a', 'b', 'c', rep('NULL', 7)),
                 'charvar2' = sample(c('a', 'b', 'c', rep('NULL', 7))) )

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM