简体   繁体   中英

Replace certain values in data.frame columns

I have a data as follows:

data<-data.frame(id=c(1,2,3,4,5,6,7,8,9,10),
                 Wt=c(91,92,85,205,285,43,95,75,76,NA),
                 Ht=c(185,182,173,171,600,650,NA,890,NA,NA))

Wt represents the weight in kilograms and Ht represents the height in centimeters. In this example, I want to treat the values of Wt bigger than 200 as outliers and change to some specific numbers. Also, I want to treat the values of Ht bigger than 250 as outliers and change to NA . In my actual data , there are few outliers in Wt and many outliers in Ht . So, I could find the outliers for Wt by using the code below:

a1<-data$Wt 

a1<-data.frame(a1)
a1<-na.omit(a1)
b1<-a1[a1$a1>200, ]
b1  #205,285

I want to change 205 to 80 and change 285 to 90. (Because, in my actual data, there are few outliers for Wt , so that I can change them individually.) Also, I want to make the values of Ht bigger than 250 as NA . So my expected output is as follows:

data<-data.frame(id=c(1,2,3,4,5,6,7,8,9,10),
                 Wt=c(91,92,85,80,90,43,95,75,76,NA),
                 Ht=c(185,182,173,171,NA,NA,NA,NA,NA,NA))

Do it by reference using data.table :

library(data.table)
setDT(data)

data[Ht > 250, Ht := NA]
data[Wt == 205, Wt := 80]
data[Wt == 285, Wt := 90]
data
    id Wt  Ht
 1:  1 91 185
 2:  2 92 182
 3:  3 85 173
 4:  4 80 171
 5:  5 90  NA
 6:  6 43  NA
 7:  7 95  NA
 8:  8 75  NA
 9:  9 76  NA
10: 10 NA  NA

For more info, see: Introduction to data.table .

The above answer is useful. I also wanted to add an alternative answer in case you may find it helpful to learn other functions. You can plug in any values you want with ifelse and the functionality of the tidyverse . As an example, I use mutate here to create variables and ifelse to simply transform the values you wanted. Below is essentially just your data and two functions combined into one command:

library(tidyverse)

data %>%
  mutate(Wt = ifelse(Wt > 200,
                     "9999",
                     Wt),
         Ht = ifelse(Ht > 250,
                     "NA",
                     Ht))

Annotated below is what I am doing with the code:

library(tidyverse) # load this library for %>% and mutate

data %>% # use this data 
  mutate(Wt = ifelse(Wt > 200, # take Wt over 200
                     "9999", # replace with this value
                     Wt), # otherwise use the original Wt value
         Ht = ifelse(Ht > 250, # take Ht over 250
                     "NA", # replace with this value
                     Ht)) # otherwise use the original Ht value

Which should give you any desired output depending on how you use it:

   id   Wt   Ht
1   1   91  185
2   2   92  182
3   3   85  173
4   4 9999  171
5   5 9999   NA
6   6   43   NA
7   7   95 <NA>
8   8   75   NA
9   9   76 <NA>
10 10 <NA> <NA>

Try it out and lemme know what you think!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM