[英]R How to loop over all character variables in a data.frame and change a specific value
I'm trying to loop over all the variables in a data.table and modify all the character variables; 我试图遍历data.table中的所有变量并修改所有字符变量; some of the values of these character variables are 'NULL', and I want to change them to ''. 这些字符变量的某些值为“ NULL”,我想将其更改为“”。
For example: I want to change 例如:我要更改
library(data.table)
df <- data.table('id' = seq(1:10),
'datadate' = seq(as.Date('2015-01-01'), as.Date('2015-01-10'), by="days"),
'charvar' = c('a', 'b', 'c', rep('NULL', 7)),
'charvar1' = c('a', 'b', 'c', 'd', rep('NULL', 6)))
id datadate charvar charvar1
1: 1 2015-01-01 a a
2: 2 2015-01-02 b b
3: 3 2015-01-03 c c
4: 4 2015-01-04 NULL d
5: 5 2015-01-05 NULL NULL
6: 6 2015-01-06 NULL NULL
7: 7 2015-01-07 NULL NULL
8: 8 2015-01-08 NULL NULL
9: 9 2015-01-09 NULL NULL
10: 10 2015-01-10 NULL NULL
into 成
id datadate charvar charvar1
1: 1 2015-01-01 a a
2: 2 2015-01-02 b b
3: 3 2015-01-03 c c
4: 4 2015-01-04 d
5: 5 2015-01-05
6: 6 2015-01-06
7: 7 2015-01-07
8: 8 2015-01-08
9: 9 2015-01-09
10: 10 2015-01-10
I tried two ways: 我尝试了两种方法:
First method: 第一种方法:
df %>%
mutate_if(is.character(.)==TRUE,
funs(function(col){col = if_else(col=='NULL', '', col)}))
from which I got the error: 从我得到的错误:
Error: length(.p) == length(vars) is not TRUE
Second method: 第二种方法:
data.frame(
lapply(df, function(col)
{if(is.character(col)==TRUE) col = ifelse(col=='NULL', '', col)})
)
For which I got the error 为此我得到了错误
Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, :
arguments imply differing number of rows: 0, 10
What am I doing wrong here? 我在这里做错了什么? Would appreciate insights into how to correct both methods and why the code above is wrong. 希望了解如何纠正这两种方法以及上面的代码错误的原因。
Since df
is a data.table
, you can modify specific rows by supplying [.data.table
with a logical vector in i
and assigning the new value in j
eg df[charvar == 'NULL', charvar := '']
. 由于df
是data.table
,因此可以通过在[.data.table
提供i
的逻辑向量,并在j
分配新值,例如df[charvar == 'NULL', charvar := '']
来修改特定行。 So you can lapply
over all character columns to do that for each of them. 因此,您可以lapply
所有字符列,以针对每个字符列执行此操作。 This avoids using ifelse
, and so avoids reassigning the entire column each time. 这避免了使用ifelse
,因此避免了每次都重新分配整个列。
library(data.table)
lapply(names(df)[sapply(df, is.character)], #lapply over all character column names
function(x) df[df[[x]] == 'NULL', (x) := '']) #set column equal to '' for rows where it equals 'NULL'
If you want to use dplyr
, you can do 如果要使用dplyr
,则可以执行
library(dplyr)
df %>%
mutate_if(is.character,
function(col) if_else(col == 'NULL', '', col))
In tidyverse
(to the extent that it is consistent), .
在tidyverse
(的范围内,它是一致的), .
represents the left-hand side of the pipe %>%
. 代表管道%>%
的左侧。 So if you use is.character(.)
as the first argument, dplyr
will evaluate is.character(df)
, which is FALSE
, a logical vector of length 1
. 因此,如果将is.character(.)
用作第一个参数,则dplyr
将计算is.character(df)
,即FALSE
,即长度为1
的逻辑向量。 But mutate_if
is expecting a logical vector of length ncol(df)
, or a function. 但是mutate_if
期望一个长度为ncol(df)
的逻辑矢量或一个函数。
Example data with > 1 character column. 包含> 1个字符列的示例数据。
df <- data.table('id' = seq(1:10),
'datadate' = seq(as.Date('2015-01-01'), as.Date('2015-01-10'), by="days"),
'charvar' = c('a', 'b', 'c', rep('NULL', 7)),
'charvar2' = sample(c('a', 'b', 'c', rep('NULL', 7))) )
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.