I have a dataframe df
. It contains factors for the most part, besides select numeric columns.
I want to create a data quality report and everything is being read in as integers. So I captured the following column indexes and wanted to convert those columns to type factor:
n_cols = c(1,3,4,9:17,28:35)
for (x in length(df)) {
if (x %in% n_cols == FALSE) {
df[,x] = as.factor(df[,x])
}
}
The code is running, but it is not properly converted when I call str(df)
.
I come from a Python background, so some of this syntax is newer to me.
To convert selected columns in a data frame to factors inside a for-loop I have created a reproducible example below using the mtcars
dataset.
Note: This depends on specifying a vector of column numbers that you do want coerced to factors. If you want to invert this logic you can insert a !
inside the if() statement to negate the logic.
# example data
data(mtcars)
# columns to go to factors
to_fact <- c(1, 3, 5, 7)
for(x in seq_along(mtcars)) {
if(x %in% to_fact){
mtcars[,x] <- as.factor(mtcars[,x])
}
}
str(mtcars)
#> 'data.frame': 32 obs. of 11 variables:
#> $ mpg : Factor w/ 25 levels "10.4","13.3",..: 16 16 19 17 13 12 3 20 19 14 ...
#> $ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
#> $ disp: Factor w/ 27 levels "71.1","75.7",..: 13 13 6 16 23 15 23 12 10 14 ...
#> $ hp : num 110 110 93 110 175 105 245 62 95 123 ...
#> $ drat: Factor w/ 22 levels "2.76","2.93",..: 16 16 15 5 6 1 7 11 17 17 ...
#> $ wt : num 2.62 2.88 2.32 3.21 3.44 ...
#> $ qsec: Factor w/ 30 levels "14.5","14.6",..: 6 10 22 24 10 29 5 27 30 19 ...
#> $ vs : num 0 0 1 1 0 1 0 1 1 1 ...
#> $ am : num 1 1 1 0 0 0 0 0 0 0 ...
#> $ gear: num 4 4 4 3 3 3 3 4 4 4 ...
#> $ carb: num 4 4 1 1 2 1 4 2 2 4 ...
Created on 2018-08-31 by the reprex package (v0.2.0).
In order to complete this more succinctly you can also use the purrr
package for functional programming:
mtcars[to_fact] <- purrr::map_df(mtcars[to_fact], as.factor)
1) You can do it in a one-liner with sapply/lapply
:
mtcars[,factorCols] <- lapply(mtcars[,factorCols], as.factor)
2) Longer alternative: no need for the nested for-if
; you know the specific column-indices of the columns you want to convert. So directly iterate over them, already:
data(mtcars)
factorCols <- c(1,3,5,7)
for (factorCol in factorCols) {
mtcars[, factorCol] <- as.factor(mtcars[, factorCol])
}
which is essentially a one-liner.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.