Selecting only numeric columns from a data frame

Question

Suppose, you have a data.frame like this:

x <- data.frame(v1=1:20,v2=1:20,v3=1:20,v4=letters[1:20])

How would you select only those columns in x that are numeric?

Answer 1

EDIT: updated to avoid use of ill-advised sapply .

Since a data frame is a list we can use the list-apply functions:

nums <- unlist(lapply(x, is.numeric), use.names = FALSE)

Then standard subsetting

x[ , nums]

## don't use sapply, even though it's less code
## nums <- sapply(x, is.numeric)

For a more idiomatic modern R I'd now recommend

x[ , purrr::map_lgl(x, is.numeric)]

Less codey, less reflecting R's particular quirks, and more straightforward, and robust to use on database-back-ended tibbles:

dplyr::select_if(x, is.numeric)

Newer versions of dplyr, also support the following syntax:

x %>% dplyr::select(where(is.numeric))

Answer 2

The dplyr package's select_if( ) function is an elegant solution:

library("dplyr")
select_if(x, is.numeric)

Answer 3

Filter() from the base package is the perfect function for that use-case: You simply have to code:

Filter(is.numeric, x)

It is also much faster than select_if() :

library(microbenchmark)
microbenchmark(
    dplyr::select_if(mtcars, is.numeric),
    Filter(is.numeric, mtcars)
)

returns (on my computer) a median of 60 microseconds for Filter , and 21 000 microseconds for select_if (350x faster).

Answer 4

in case you are interested only in column names then use this:

names(dplyr::select_if(train,is.numeric))

Answer 5

iris %>% dplyr::select(where(is.numeric)) #as per most recent updates

Another option with purrr would be to negate discard function:

iris %>% purrr::discard(~!is.numeric(.))

If you want the names of the numeric columns, you can add names or colnames :

iris %>% purrr::discard(~!is.numeric(.)) %>% names

Answer 6

This an alternate code to other answers:

x[, sapply(x, class) == "numeric"]

with a data.table

x[, lapply(x, is.numeric) == TRUE, with = FALSE]

Answer 7

library(purrr)
x <- x %>% keep(is.numeric)

Answer 8

The library PCAmixdata has functon splitmix that splits quantitative(Numerical data) and qualitative (Categorical data) of a given dataframe "YourDataframe" as shown below:

install.packages("PCAmixdata")
library(PCAmixdata)
split <- splitmix(YourDataframe)
X1 <- split$X.quanti(Gives numerical columns in the dataset) 
X2 <- split$X.quali (Gives categorical columns in the dataset)

Answer 9

If you have many factor variables, you can use select_if funtion. install the dplyr packages. There are many function that separates data by satisfying a condition. you can set the conditions.

Use like this.

categorical<-select_if(df,is.factor)
str(categorical)

Answer 10

Another way could be as follows:-

#extracting numeric columns from iris datset
(iris[sapply(iris, is.numeric)])

Answer 11

Numerical_variables <- which(sapply(df, is.numeric))
# then extract column names 
Names <- names(Numerical_variables)

Answer 12

This doesn't directly answer the question but can be very useful, especially if you want something like all the numeric columns except for your id column and dependent variable.

numeric_cols <- sapply(dataframe, is.numeric) %>% which %>% 
                   names %>% setdiff(., c("id_variable", "dep_var"))

dataframe %<>% dplyr::mutate_at(numeric_cols, function(x) your_function(x))

Selecting only numeric columns from a data frame

Question

12 answers

solution1
363 ACCPTED 2011-05-02 22:28:36

solution2
93 2016-11-25 16:08:16

solution3
55 2016-11-09 10:31:48

solution4
9 2018-04-05 09:44:54

solution5
9 2020-10-20 07:30:40

solution6
8 2016-11-13 16:11:02

solution7
5 2020-03-23 15:50:51

solution8
3 2017-11-13 15:42:19

solution9
1 2017-01-06 00:19:05

solution10
0 2018-10-09 06:00:16

solution11
0 2020-07-30 21:08:27

solution12
-1 2018-03-29 16:32:46

Selecting only numeric columns from a data frame

Question

12 answers

solution1 363 ACCPTED 2011-05-02 22:28:36

solution2 93 2016-11-25 16:08:16

solution3 55 2016-11-09 10:31:48

solution4 9 2018-04-05 09:44:54

solution5 9 2020-10-20 07:30:40

solution6 8 2016-11-13 16:11:02

solution7 5 2020-03-23 15:50:51

solution8 3 2017-11-13 15:42:19

solution9 1 2017-01-06 00:19:05

solution10 0 2018-10-09 06:00:16

solution11 0 2020-07-30 21:08:27

solution12 -1 2018-03-29 16:32:46

solution1
363 ACCPTED 2011-05-02 22:28:36

solution2
93 2016-11-25 16:08:16

solution3
55 2016-11-09 10:31:48

solution4
9 2018-04-05 09:44:54

solution5
9 2020-10-20 07:30:40

solution6
8 2016-11-13 16:11:02

solution7
5 2020-03-23 15:50:51

solution8
3 2017-11-13 15:42:19

solution9
1 2017-01-06 00:19:05

solution10
0 2018-10-09 06:00:16

solution11
0 2020-07-30 21:08:27

solution12
-1 2018-03-29 16:32:46