简体   繁体   中英

Convert selected dataframe columns to factor using loop?

I have a dataframe df . It contains factors for the most part, besides select numeric columns.

I want to create a data quality report and everything is being read in as integers. So I captured the following column indexes and wanted to convert those columns to type factor:

n_cols = c(1,3,4,9:17,28:35)

for (x in length(df)) {
  if (x %in% n_cols == FALSE) {
    df[,x] = as.factor(df[,x])
  }
}

The code is running, but it is not properly converted when I call str(df) .

I come from a Python background, so some of this syntax is newer to me.

To convert selected columns in a data frame to factors inside a for-loop I have created a reproducible example below using the mtcars dataset.

Note: This depends on specifying a vector of column numbers that you do want coerced to factors. If you want to invert this logic you can insert a ! inside the if() statement to negate the logic.

# example data
data(mtcars)

# columns to go to factors
to_fact <- c(1, 3, 5, 7)

for(x in seq_along(mtcars)) {
  if(x %in% to_fact){
    mtcars[,x] <- as.factor(mtcars[,x]) 
  }
}

str(mtcars)
#> 'data.frame':    32 obs. of  11 variables:
#>  $ mpg : Factor w/ 25 levels "10.4","13.3",..: 16 16 19 17 13 12 3 20 19 14 ...
#>  $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
#>  $ disp: Factor w/ 27 levels "71.1","75.7",..: 13 13 6 16 23 15 23 12 10 14 ...
#>  $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
#>  $ drat: Factor w/ 22 levels "2.76","2.93",..: 16 16 15 5 6 1 7 11 17 17 ...
#>  $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
#>  $ qsec: Factor w/ 30 levels "14.5","14.6",..: 6 10 22 24 10 29 5 27 30 19 ...
#>  $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
#>  $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
#>  $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
#>  $ carb: num  4 4 1 1 2 1 4 2 2 4 ...

Created on 2018-08-31 by the reprex package (v0.2.0).

In order to complete this more succinctly you can also use the purrr package for functional programming:

mtcars[to_fact] <- purrr::map_df(mtcars[to_fact], as.factor)

1) You can do it in a one-liner with sapply/lapply :

mtcars[,factorCols] <- lapply(mtcars[,factorCols], as.factor)

2) Longer alternative: no need for the nested for-if ; you know the specific column-indices of the columns you want to convert. So directly iterate over them, already:

data(mtcars)
factorCols <- c(1,3,5,7)

for (factorCol in factorCols) {
  mtcars[, factorCol] <- as.factor(mtcars[, factorCol])
}

which is essentially a one-liner.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM