I have a data
as follows:
data<-data.frame(id=c(1,2,3,4,5,6,7,8,9,10),
Wt=c(91,92,85,205,285,43,95,75,76,NA),
Ht=c(185,182,173,171,600,650,NA,890,NA,NA))
Wt
represents the weight in kilograms and Ht
represents the height in centimeters. In this example, I want to treat the values of Wt
bigger than 200 as outliers and change to some specific numbers. Also, I want to treat the values of Ht
bigger than 250 as outliers and change to NA
. In my actual data
, there are few outliers in Wt
and many outliers in Ht
. So, I could find the outliers for Wt
by using the code below:
a1<-data$Wt
a1<-data.frame(a1)
a1<-na.omit(a1)
b1<-a1[a1$a1>200, ]
b1 #205,285
I want to change 205 to 80 and change 285 to 90. (Because, in my actual data, there are few outliers for Wt
, so that I can change them individually.) Also, I want to make the values of Ht
bigger than 250 as NA
. So my expected output is as follows:
data<-data.frame(id=c(1,2,3,4,5,6,7,8,9,10),
Wt=c(91,92,85,80,90,43,95,75,76,NA),
Ht=c(185,182,173,171,NA,NA,NA,NA,NA,NA))
Do it by reference using data.table
:
library(data.table)
setDT(data)
data[Ht > 250, Ht := NA]
data[Wt == 205, Wt := 80]
data[Wt == 285, Wt := 90]
data
id Wt Ht
1: 1 91 185
2: 2 92 182
3: 3 85 173
4: 4 80 171
5: 5 90 NA
6: 6 43 NA
7: 7 95 NA
8: 8 75 NA
9: 9 76 NA
10: 10 NA NA
For more info, see: Introduction to data.table
.
The above answer is useful. I also wanted to add an alternative answer in case you may find it helpful to learn other functions. You can plug in any values you want with ifelse
and the functionality of the tidyverse
. As an example, I use mutate
here to create variables and ifelse
to simply transform the values you wanted. Below is essentially just your data and two functions combined into one command:
library(tidyverse)
data %>%
mutate(Wt = ifelse(Wt > 200,
"9999",
Wt),
Ht = ifelse(Ht > 250,
"NA",
Ht))
Annotated below is what I am doing with the code:
library(tidyverse) # load this library for %>% and mutate
data %>% # use this data
mutate(Wt = ifelse(Wt > 200, # take Wt over 200
"9999", # replace with this value
Wt), # otherwise use the original Wt value
Ht = ifelse(Ht > 250, # take Ht over 250
"NA", # replace with this value
Ht)) # otherwise use the original Ht value
Which should give you any desired output depending on how you use it:
id Wt Ht
1 1 91 185
2 2 92 182
3 3 85 173
4 4 9999 171
5 5 9999 NA
6 6 43 NA
7 7 95 <NA>
8 8 75 NA
9 9 76 <NA>
10 10 <NA> <NA>
Try it out and lemme know what you think!
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.