简体   繁体   中英

Selectively Remove Column Values in R Data Frame

Example

Suppose in the famous iris data set, I have determined that when Sepal.Length > 5.0, there was an error in my measurement device.

In this contrived example, I would like to keep the Sepal.Length column with its original value, but change the remaining columns to NA if the Sepal.Length > 5.0 for that row.

As an example, this:

  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa

Would become this:

  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         NA           NA          NA   NA
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         NA           1.7         NA   NA

I could certain do this manually via vectorization. Something along the lines of:

iris$Sepal.Width <- ifelse(iris$Sepal.Length > 5.0, NA, iris$Sepal.Width)

In this approach however, I would need to manually specify every column.

Question

I strongly suspect there is a clever way to tackle this via either purrr or dplyr . Nevertheless, I've gotten myself down a pmap / modify_at rabbit hole. Any suggestions towards elegance would be much appreciated.

Thanks!

library(data.table)

dt <- copy(iris)
setDT(dt)

dt[Sepal.Length > 5.0, (which(!names(dt) == "Sepal.Length")) := NA]
#      Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#   1:          5.1          NA           NA          NA      NA
#   2:          4.9         3.0          1.4         0.2  setosa
#   3:          4.7         3.2          1.3         0.2  setosa
#   4:          4.6         3.1          1.5         0.2  setosa
#   5:          5.0         3.6          1.4         0.2  setosa
#  ---                                                          
# 146:          6.7          NA           NA          NA      NA
# 147:          6.3          NA           NA          NA      NA
# 148:          6.5          NA           NA          NA      NA
# 149:          6.2          NA           NA          NA      NA
# 150:          5.9          NA           NA          NA      NA

Alternative would be to simply use this (this is only handy if you are interested in all columns, beginning with the second one)

iris[iris$Sepal.Length > 5.0, 2:ncol(iris)] <- NA

# And the output for first six rows

  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1          NA           NA          NA    <NA>
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4          NA           NA          NA    <NA>

It sounds like this would work for you

my_clip <- function(x, z) ifelse(z>5, NA, x)
iris %>% mutate_at(vars(-Sepal.Length), my_clip, z=.$Sepal.Length)

#   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1          5.1          NA           NA          NA      NA
# 2          4.9         3.0          1.4         0.2       1
# 3          4.7         3.2          1.3         0.2       1
# 4          4.6         3.1          1.5         0.2       1
# 5          5.0         3.6          1.4         0.2       1
# 6          5.4          NA           NA          NA      NA

We use mutate_at to grab all the column we want to transform and then since you can't reference other columns easily in your mutate_at function, we need to pass in the threshold column as a separate parameter using the .$ syntax.

Since you asked for a purrr example, here goes. Although I prefer the data.table answer already proposed:

library(purrr)
library(tidyr)

iris %>% nest(-Sepal.Length) %>% 
mutate(data = ifelse(Sepal.Length > 5.0, 
                   map(data, function(x) x*NA), data)) %>% 
unnest

With magrittr you could do this :

library(magrittr)
iris %>% head %>% inset(.$Sepal.Length > 5,-1,NA)

or using base R instead of magrittr (same output, just uglier function :), and you still need magrittr or dplyr for the pipes):

iris %>% head %>% `[<-`(.$Sepal.Length > 5,-1,NA)

-1 is the index of the column you want to keep, negated.

result

#   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1          5.1          NA           NA          NA    <NA>
# 2          4.9         3.0          1.4         0.2  setosa
# 3          4.7         3.2          1.3         0.2  setosa
# 4          4.6         3.1          1.5         0.2  setosa
# 5          5.0         3.6          1.4         0.2  setosa
# 6          5.4          NA           NA          NA    <NA>

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM