I used the following code to try to replace variables's value that are below the bottom 2.5% and above the top 97.5% with specific values.You can perform that code. It provides open data file.
credit<-read.csv("http://freakonometrics.free.fr/german_credit.csv", header=TRUE)
fun <- function(x){
quantiles <- quantile( x, c(.025, .975 ) )
x[ x < quantiles[1] ] <- quantiles[1]
x[ x > quantiles[2] ] <- quantiles[2]
x
}
fun(credit)
But the error message is appeared.
Error in `[.data.frame`(x, order(x, na.last = na.last, decreasing = decreasing)) :
undefined columns selected
What's the problem? I happy to any help!
+Addition comment
I found that the above function does not work in the data frame but works only in the vector.
I can change the outlier of each variable in the data file with the following code:
credit$Duration.of.Credit..month. <- pmax(quantile(credit$Duration.of.Credit..month.,.025),
pmin(credit$Duration.of.Credit..month., quantile(credit$Duration.of.Credit..month.,.975)))
However, my data file has so many variables that it is inconvenient to enter code one by one.
So how can I change the outliers of the variables that a specific value not pmax&pmin?
There's actually nothing wrong with your function as long as you apply it to a column. I'd use mutate_at
or mutate_all
(if you really want to apply it to all columns) of the dplyr package. Something like this:
library(dplyr)
credit_trunc <- credit %>%
mutate_at(vars(Credit.Amount, Creditability), funs(fun))
or
credit_trunc <- credit %>%
mutate_all(funs(fun))
or if you also have columns of another type (eg factor, character) in your data frame, you can use:
credit_trunc <- credit %>%
mutate_if(is.numeric, funs(fun))
This will give you back the data frame with the chosen / all columns / all numeric columns modified as you wanted it.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.