[英]replace multiple values in data.table with R
I have tried multiple ways to replace two values in a data table with NA.我尝试了多种方法来用 NA 替换数据表中的两个值。
The data are here .数据在这里。 There are two values
9223372036854775807
and 2147483647
which I intend to replace with NA
有两个值
9223372036854775807
和2147483647
我打算用NA
替换
library(data.table)
data <- fread("https://raw.githubusercontent.com/Deborah-Jia/Complete_Analysis_da2/main/eg1.csv", integer64 = "numeric")
I tried:我试过了:
data[data = 9223372036854775807|2147483647]
had error:有错误:
Error in
[.data.table
(data, , data = 9223372036854775808 | 2147483647, : unused argument (data = 9223372036854775808 | 2147483647)[.data.table
中的错误(数据,,数据 = 9223372036854775808 | 2147483647,:未使用的参数(数据 = 9223372036854775808 | 2147483647)
I checked the structure of [i, j, by...] but couldn't find the cause.我检查了 [i, j, by...] 的结构,但找不到原因。 So, I use for loop instead:
所以,我改用 for 循环:
# only these cols have 9223372036854775807 and 2147483647
special_col <- data %>% select(matches("price|size|room")) %>% colnames()
for ( icol in special_col) {
data[icol == 9223372036854775807|2147483647, icol := NA]
}
It didn't work as expected;它没有按预期工作; I can still find
2147483647
in the data table.我仍然可以在数据表中找到
2147483647
。
I know I can use我知道我可以使用
data[total_room_count_high == 9223372036854775807|2147483647, total_room_count_high := NA]
and replicate each column, but it is rather tiresome.并复制每一列,但这很烦人。
Before these methods, I also did across
, filter_at
and mapply
combined with a function
to process each column.在这些方法之前,我也做
across
cross 、 filter_at
和mapply
结合function
来处理每一列。 But as long as I put col
inside data[ ]
, then data.table would think col
is a column name rather than a variable representing all columns.但是只要我把
col
放在data[ ]
中,那么 data.table 就会认为col
是列名而不是代表所有列的变量。
For comparison you should use ==
.为了比较,您应该使用
==
。 You can use |
您可以使用
|
as -作为 -
data <- read.csv2("https://raw.githubusercontent.com/Deborah-Jia/Complete_Analysis_da2/main/eg1.csv")
data[data == 2147483647 | data == 9223372036854775807] <- NA
data
an approach using set
使用
set
的方法
values <- c(9223372036854775807, 2147483647)
for(col in names(data)) set(data, i = which(data[[col]] %in% values), j = col, value = NA_integer_)
Note that 9223372036854775807
is not "really" in your data but is the way that integer64 sometimes represent NAs.请注意,
9223372036854775807
在您的数据中并不是“真正”的,而是 integer64 有时表示 NA 的方式。
You can just replace the integer columns.您只需更换 integer 列即可。
data <- fread("https://raw.githubusercontent.com/Deborah-Jia/Complete_Analysis_da2/main/eg1.csv", integer64 = "numeric")
for (j in seq_along(data)) {
vj <- .subset2(data, j)
if (is.integer(vj)) {
i <- which(vj == .Machine$integer.max)
set(data, i = i, j = j, value = NA_integer_)
}
}
Note that 2147483647
might be there to represent "positive infinity" using integers and so might be a better representation than NA
.请注意,
2147483647
可能在那里使用整数表示“正无穷大”,因此可能比NA
更好。 (For example, if you want to filter on all properties above a certain price, these properties will be erroneously filtered if you replace these values with NA
). (例如,如果您想过滤高于某个价格的所有属性,如果您将这些值替换为
NA
,这些属性将被错误地过滤)。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.