简体   繁体   English

R - 将各种虚拟/逻辑变量从其名称转换为单个分类变量/因子

[英]R - Convert various dummy/logical variables into a single categorical variable/factor from their name

My question has strong similarities with this one and this other one , but my dataset is a little bit different and I can't seem to make those solutions work. 我的问题与这个另一个问题有很大的相似之处,但我的数据集有点不同,我似乎无法使这些解决方案有效。 Please excuse me if I misunderstood something and this question is redundant. 如果我误解了什么,请原谅我,这个问题是多余的。

I have a dataset such as this one: 我有一个这样的数据集:

df <- data.frame(
  id = c(1:5),
  conditionA = c(1, NA, NA, NA, 1),
  conditionB = c(NA, 1, NA, NA, NA),
  conditionC = c(NA, NA, 1, NA, NA),
  conditionD = c(NA, NA, NA, 1, NA)
  )
# id conditionA conditionB conditionC conditionD
# 1  1          1         NA         NA         NA
# 2  2         NA          1         NA         NA
# 3  3         NA         NA          1         NA
# 4  4         NA         NA         NA          1
# 5  5          1         NA         NA         NA

(Note that apart from these columns, I have a lot of other columns that shouldn't be affected by the current manipulation.) (请注意,除了这些列之外,我还有很多其他列不应受当前操作的影响。)

So, I observe that conditionA , conditionB , conditionC and conditionD are mutually exclusives and should be better presented as a single categorical variable, ie factor , that should look like this : 因此,我观察到conditionAconditionBconditionCconditionD D是相互排斥的,应该更好地表示为单个分类变量,即factor ,应该如下所示:

#   id       type
# 1  1 conditionA
# 2  2 conditionB
# 3  3 conditionC
# 4  4 conditionD
# 5  5 conditionA

I have investigated using gather or unite from tidyr , but it doesn't correspond to this case (with unite , we lose the information from the variable name). 我已经使用tidyr gatherunitetidyr ,但它与这种情况不符(有unite ,我们会丢失变量名称中的信息)。

I tried using kimisc::coalescence.na , as suggested in the first referred answer, but 1. I need first to set a factor value based on the name for each column, 2. it doesn't work as expected, only including the first column : 我尝试使用kimisc::coalescence.na ,如第一个提到的答案中所建议的,但是1.我首先需要根据每列的名称设置一个因子值,2。它不能按预期工作,只包括第一栏:

library(kimisc)
# first, factor each condition with a specific label
df$conditionA <- df$conditionA %>%
  factor(levels = 1, labels = "conditionA")
df$conditionB <- df$conditionB %>%
  factor(levels = 1, labels = "conditionB")
df$conditionC <- df$conditionC %>%
  factor(levels = 1, labels = "conditionC")
df$conditionD <- df$conditionD %>%
  factor(levels = 1, labels = "conditionD")

# now coalesce.na to merge into a single variable
df$type <- coalesce.na(df$conditionA, df$conditionB, df$conditionC, df$conditionD)

df
#   id conditionA conditionB conditionC conditionD       type
# 1  1 conditionA       <NA>       <NA>       <NA> conditionA 
# 2  2       <NA> conditionB       <NA>       <NA>       <NA> 
# 3  3       <NA>       <NA> conditionC       <NA>       <NA> 
# 4  4       <NA>       <NA>       <NA> conditionD       <NA> 
# 5  5 conditionA       <NA>       <NA>       <NA> conditionA

I tried the other suggestions from the second question, but haven't found one that would bring me the expected result... 我尝试了第二个问题中的其他建议,但没有找到一个会给我带来预期结果的建议......

Try: 尝试:

library(dplyr)
library(tidyr)

df %>% gather(type, value, -id) %>% na.omit() %>% select(-value) %>% arrange(id)

Which gives: 这使:

#  id       type
#1  1 conditionA
#2  2 conditionB
#3  3 conditionC
#4  4 conditionD
#5  5 conditionA

Update 更新

To handle the case you detailed in the comments, you could do the operation on the desired portion of the data frame and then left_join() the other columns: 要处理您在注释中详细说明的情况,您可以对数据框的所需部分执行操作,然后left_join()执行其他列:

df %>% 
  select(starts_with("condition"), id) %>% 
  gather(type, value, -id) %>% 
  na.omit() %>% 
  select(-value) %>% 
  left_join(., df %>% select(-starts_with("condition"))) %>%
  arrange(id)

You can also try: 你也可以尝试:

colnames(df)[2:5][max.col(!is.na(df[,2:5]))]
#[1] "conditionA" "conditionB" "conditionC" "conditionD" "conditionA"

The above works if one and only one column has a value other than NA for each row. 如果每行只有一列的值不是NA ,则上述方法有效。 If the values of a row can be all NA s, then you can try: 如果一行的值可以全部为NA ,那么您可以尝试:

mat<-!is.na(df[,2:5])
colnames(df)[2:5][max.col(mat)*(NA^!rowSums(mat))]
library(tidyr)
library(dplyr)

df <- df %>%
  gather(type, count, -id)
df <- df[complete.cases(df),][,-3]
df[order(df$id),]
   id       type
1   1 conditionA
7   2 conditionB
13  3 conditionC
19  4 conditionD
5   5 conditionA

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 将多个虚拟/逻辑变量转换为 R dplyr 中的单个分类变量 - Convert multiple dummy/logical variables into a single categorical variable in R dplyr R中的单个分类变量(因子)的虚拟变量 - dummy variables to single categorical variable (factor) in R 如何在 R 中将一个分类变量转换为多个虚拟变量? - How do convert a categorical variable into multiple dummy variables in R? R查询:从分类变量创建虚拟变量 - R Query: Creating Dummy Variables from a Categorical Variable 在 R 中将虚拟变量隐藏为单个分类? - Covert dummy variables to single categorical in R? 具有分类变量的回归模型:虚拟代码或转换为因子 - Regression models with categorical variable: dummy code or convert to factor 将 R 中的因子转换为日期以创建虚拟变量 - convert factor to date in R to create dummy variable R:将多个伪变量重新编码为单个变量,并用变量名替换相应的伪值 - R: Recoding multiple dummy variables into a single variable and replacing the corresponding dummy value with the variable name 将多个虚拟变量收集为 R 中的一个分类变量 - Gathering multiple dummy variables as one categorical variable in R R:如何仅为分类变量的顶级获取虚拟变量? - R: how to get dummy variables only for top levels of a categorical variable?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM