简体   繁体   English

如何在 R 中将 2 个单独的分类变量编码为一个?

[英]How do I code 2 seperate categorical variables into a single one in R?

I have two continuous variables that I dummy coded into a categorical variable with 2 levels.我有两个连续变量,我将它们虚拟编码为一个具有 2 个级别的分类变量。 Each of these variables are coded either 0 or 1 for low and high levels of this variable.这些变量中的每一个都被编码为 0 或 1,表示该变量的低和高水平。 Both variables were z-scored to know if they fell below or above the mean.这两个变量都进行了 z 评分,以了解它们是低于还是高于均值。

MeanAboveAvo <- ifelse(Dataframeforstudy2$avo < 0, 0, 1)

MeanAboveAnx <- ifelse(Dataframeforstudy2$anx < 0, 0 , 1)

My question is how do I dummy code these two variables together?我的问题是如何将这两个变量虚拟编码在一起? I want to create a single variable with 4 different levels using these two variables (MeanAboveAvo & MeanAboveAnx).我想使用这两个变量(MeanAboveAvo 和 MeanAboveAnx)创建一个具有 4 个不同级别的变量。 I want a single variable that is coded with either 1,2,3,4 and the 1 is (0,0), 2 is (0,1), 3 is (1,0) and 4 is (1,1).我想要一个用 1,2,3,4 编码的单个变量,1 是 (0,0),2 是 (0,1),3 是 (1,0),4 是 (1,1) .

My code is this:我的代码是这样的:

stats <- while(MeanAboveAnx = 0 || MeanAboveAvx = 1) {


   if(MeanAboveAnx = 0 & MeanAboveAvo = 0 ){
   1
} 

else if (MeanAboveAnx = 0 & MeanAboveAvo = 1){
 2
}

  else if(MeanAboveAnx = 1 & MeanAboveAvo = 0){
     3
 } 

else {
    4
   }}

It is not coding it at all and I am getting an error message.它根本没有编码,我收到一条错误消息。 What can I do differently to get the results I want?我可以做些什么不同的事情来获得我想要的结果?

Thank you for your help in advance!提前谢谢你的帮助!

Base R has function interaction precisely for this type of problem. Base R 正是针对这类问题的函数interaction The code below can become a one-liner, I leave it like this in order to make it more clear.下面的代码可以变成一行代码,为了更清晰,我就这样留着。

f <- with(df, interaction(anx, avo, lex.order = TRUE))
as.integer(f)
# [1] 1 2 1 1 2 3 3 3 4 2

Edit.编辑。

I was using the data in TomasIsCoding's answer, here is a solution more to the question's problem, with anx and avo as z-scores.我使用的数据TomasIsCoding的答案,这里是一个解决方案的更多的问题的问题,与anxavo为z分数。 Thanks to @KonradRudolph for his comment.感谢@KonradRudolph 的评论。

f <- with(df, interaction(as.integer(anx < 0), 
                          as.integer(avo < 0), 
                          lex.order = TRUE))
f
# [1] 1.1 0.1 0.1 1.0 0.0 0.1 1.1 1.1 1.1 1.0
#Levels: 0.0 0.1 1.0 1.1

as.integer(f)
# [1] 4 2 2 3 1 2 4 4 4 3

Data.数据。

set.seed(1234)
df <- data.frame(anx = rnorm(10), avo = rnorm(10))

Categorical variables in in R don't need to be numeric (and making them so has several drawbacks!): there's consequently no need for your ifelse : R 中的分类变量不需要是数字(并且这样做有几个缺点!):因此不需要你的ifelse

MeanAboveAvo <- Dataframeforstudy2$avo < 0
MeanAboveAnx <- Dataframeforstudy2$anx < 0

Next, the code using these encodings contains multiple mistakes:接下来,使用这些编码的代码包含多个错误:

  1. It's not clear what the while here is supposed to mean.目前还不清楚这里的while是什么意思。
  2. All = signs need to be converted to == because you're performing comparisons .所有=符号都需要转换为==因为您正在执行比较
  3. if , unlike ifelse , isn't vectorised so you cannot use it to assign its result to a vector of length > 1. ififelse不同,它不是矢量化的,因此您不能使用它将其结果分配给长度 > 1 的向量。

If I understand you correctly, then the following is one (canonical) way of encoding the stats :如果我理解正确,那么以下是编码stats一种(规范)方式:

stats <- paste(MeanAboveAvo, MeanAboveAnx)

This converts the logical vectors into character vectors and concatenates them element-wise.这会将逻辑向量转换为字符向量并按元素连接它们。 Once again, it is unnecessary (and unconventional!) in R to convert these categories into a numeric variable;再一次,在 R 中没有必要(并且非常规!)将这些类别转换为数字变量; though it may make sense to convert it to a factor via as.factor .尽管通过as.factor将其转换为因子可能是有意义的。

From the mapping rule to code the anx and avo , you actually don't need while loop, since yours is a shifted mapping from binary to decimal.从映射规则代码anxavo ,你其实并不需要while循环,因为你是从二进制到十进制移位映射。 In this case, you can do it like below在这种情况下,你可以像下面那样做

df <- within(df,code <- 2*anx + avo + 1)

such that以至于

> df
   anx avo code
1    0   0    1
2    0   1    2
3    0   0    1
4    0   0    1
5    0   1    2
6    1   0    3
7    1   0    3
8    1   0    3
9    1   1    4
10   0   1    2

Dummy Data虚拟数据

df <- structure(list(anx = c(0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 0L
), avo = c(0L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 1L, 1L)), class = "data.frame", row.names = c(NA, 
-10L))

Try this:尝试这个:

as.integer(factor(paste0(MeanAboveAvo, MeanAboveAnx)))

For example:例如:

set.seed(123)
x <- sample(0:1, 10, T) # [1] 0 0 0 1 0 1 1 1 0 0
y <- sample(0:1, 10, T) # [1] 1 1 1 0 1 0 1 0 0 0
as.integer(factor(paste0(x, y)))
# [1] 2 2 2 3 2 3 4 3 1 1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在R中的图形上绘制多个类别变量? - How do I plot a number of categorical variables on a graph in R? 如何在R或Python中将分类数据折叠到单个记录中? - How do I collapse categorical data into a single record in R or Python? 如何从 R 中的 plot 中删除一些分类变量? - How do I drop some categorical variables from a plot in R? 如何根据两个分类变量和一个数值变量比较 R 数据框中变量之间的相似性 - How do you compare similarities between variables in an R data frame, based on two categorical variables and one numeric variables 如何在 r 中使用虚拟变量创建分类变量? - How do you create categorical variables using dummy variables in r? 如何将r中的3个变量合并为单个变量 - How do I merge 3 variables in r into a single single variable 如何在R中使用几个分类变量对数据集进行一次热编码? - How can I one-hot encode my dataset with several categorical variables in R? R中的数据如何按类别分离 - How do I seperate data in R by category 如何过滤多个类别分类变量的值以在R中绘制图? - How do I filter by multiple values of muliple categorical variables to make a plot in R? R-如何按时间间隔为时序数据中的分类变量计算平均值? - R - How do I calculate a mean value at intervals for categorical variables in time sequence data?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM