[英]how to drop multiple column which has categorical values in R?
I know how to drop columns by name, but I am not quite sure how I am going to drop the columns which has categorical values.我知道如何按名称删除列,但我不太确定如何删除具有分类值的列。 It can be done manually looking at which columns has categorical values, but not intuitive for me using R code.
可以手动查看哪些列具有分类值,但使用 R 代码对我来说并不直观。 How to detect columns which has categorical values?
如何检测具有分类值的列? any way to make this happen?
有什么办法可以做到这一点?
minimal data最小数据
mydf=structure(list(taken_time = c(15L, 5L, 39L, -21L, 46L, 121L),
ap6xl = c(203.2893857, 4.858269406, 2, 14220, 218.2215352,
115.5227706), pct5 = c(732.074484, 25.67901235, 1.01, 120.0477168,
3621.328567, 79.30561111), crp4 = c(196115424.7, 1073624.455,
1.23, 1457496.474, 10343851.7, 81288042.73), age = c(52L,
74L, 52L, 67L, 82L, 67L), gender = structure(c(2L, 2L, 2L,
1L, 2L, 1L), .Label = c("F", "M"), class = "factor"), inpatient_readmission_time_rtd = c(79.78819444,
57.59068053, 57.59068053, 57.59068053, 57.59068053, 9.893055556
), infection_flag = c(0L, 0L, 1L, 1L, 0L, 1L), temperature_value = c(98.9,
98.9, 98, 101.3, 99.5, 98.1), heartrate_value = c(106, 61,
78, 91, 120, 68), pH_result_time_rta = c(11, 85.50402145,
85.50402145, 85.50402145, 85.50402145, 85.50402145), gcst_value = c(15,
15, 15, 14.63769293, 15, 14.63769293)), row.names = c(NA,
6L), class = "data.frame")
instead of manually typing name of columns which has categorical values, is there any way we can detect categorical columns and drop it?而不是手动输入具有分类值的列的名称,有什么方法可以检测分类列并将其删除?
I am concerning the case such as dataframe might have more than 10 categorical columns, it is sort of pain, so I am curious if it is possible using R.我关心的情况是 dataframe 可能有超过 10 个分类列,这有点痛苦,所以我很好奇是否可以使用 R。 any thought?
任何想法?
for example, I can do this for above dataframe by manually looking at which one are categorical columns:例如,我可以通过手动查看哪一个是分类列来为上述 dataframe 执行此操作:
mydf <- mydf[!names(mydf) %in% c("gender", "infection_flag")]
is there any way we can detect which ones is categorical columns and drop it for numerical calculation purpose?有什么方法可以检测哪些是分类列并将其删除以进行数值计算? any idea?
任何想法?
You can use dplyr
and select all the numerical columns:您可以使用
dplyr
和 select 所有数值列:
library(dplyr)
mydf %>% select_if(is.numeric)
An option with base R
带有
base R
的选项
i1 <- sapply(mydf, is.numeric)
df[i1]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.