[英]How do I code 2 seperate categorical variables into a single one in R?
I have two continuous variables that I dummy coded into a categorical variable with 2 levels.我有两个连续变量,我将它们虚拟编码为一个具有 2 个级别的分类变量。 Each of these variables are coded either 0 or 1 for low and high levels of this variable.这些变量中的每一个都被编码为 0 或 1,表示该变量的低和高水平。 Both variables were z-scored to know if they fell below or above the mean.这两个变量都进行了 z 评分,以了解它们是低于还是高于均值。
MeanAboveAvo <- ifelse(Dataframeforstudy2$avo < 0, 0, 1)
MeanAboveAnx <- ifelse(Dataframeforstudy2$anx < 0, 0 , 1)
My question is how do I dummy code these two variables together?我的问题是如何将这两个变量虚拟编码在一起? I want to create a single variable with 4 different levels using these two variables (MeanAboveAvo & MeanAboveAnx).我想使用这两个变量(MeanAboveAvo 和 MeanAboveAnx)创建一个具有 4 个不同级别的变量。 I want a single variable that is coded with either 1,2,3,4 and the 1 is (0,0), 2 is (0,1), 3 is (1,0) and 4 is (1,1).我想要一个用 1,2,3,4 编码的单个变量,1 是 (0,0),2 是 (0,1),3 是 (1,0),4 是 (1,1) .
My code is this:我的代码是这样的:
stats <- while(MeanAboveAnx = 0 || MeanAboveAvx = 1) {
if(MeanAboveAnx = 0 & MeanAboveAvo = 0 ){
1
}
else if (MeanAboveAnx = 0 & MeanAboveAvo = 1){
2
}
else if(MeanAboveAnx = 1 & MeanAboveAvo = 0){
3
}
else {
4
}}
It is not coding it at all and I am getting an error message.它根本没有编码,我收到一条错误消息。 What can I do differently to get the results I want?我可以做些什么不同的事情来获得我想要的结果?
Thank you for your help in advance!提前谢谢你的帮助!
Base R has function interaction
precisely for this type of problem. Base R 正是针对这类问题的函数interaction
。 The code below can become a one-liner, I leave it like this in order to make it more clear.下面的代码可以变成一行代码,为了更清晰,我就这样留着。
f <- with(df, interaction(anx, avo, lex.order = TRUE))
as.integer(f)
# [1] 1 2 1 1 2 3 3 3 4 2
I was using the data in TomasIsCoding's answer, here is a solution more to the question's problem, with anx
and avo
as z-scores.我使用的数据TomasIsCoding的答案,这里是一个解决方案的更多的问题的问题,与anx
和avo
为z分数。 Thanks to @KonradRudolph for his comment.感谢@KonradRudolph 的评论。
f <- with(df, interaction(as.integer(anx < 0),
as.integer(avo < 0),
lex.order = TRUE))
f
# [1] 1.1 0.1 0.1 1.0 0.0 0.1 1.1 1.1 1.1 1.0
#Levels: 0.0 0.1 1.0 1.1
as.integer(f)
# [1] 4 2 2 3 1 2 4 4 4 3
Data.数据。
set.seed(1234)
df <- data.frame(anx = rnorm(10), avo = rnorm(10))
Categorical variables in in R don't need to be numeric (and making them so has several drawbacks!): there's consequently no need for your ifelse
: R 中的分类变量不需要是数字(并且这样做有几个缺点!):因此不需要你的ifelse
:
MeanAboveAvo <- Dataframeforstudy2$avo < 0
MeanAboveAnx <- Dataframeforstudy2$anx < 0
Next, the code using these encodings contains multiple mistakes:接下来,使用这些编码的代码包含多个错误:
while
here is supposed to mean.目前还不清楚这里的while
是什么意思。=
signs need to be converted to ==
because you're performing comparisons .所有=
符号都需要转换为==
因为您正在执行比较。if
, unlike ifelse
, isn't vectorised so you cannot use it to assign its result to a vector of length > 1. if
与ifelse
不同,它不是矢量化的,因此您不能使用它将其结果分配给长度 > 1 的向量。 If I understand you correctly, then the following is one (canonical) way of encoding the stats
:如果我理解正确,那么以下是编码stats
一种(规范)方式:
stats <- paste(MeanAboveAvo, MeanAboveAnx)
This converts the logical vectors into character vectors and concatenates them element-wise.这会将逻辑向量转换为字符向量并按元素连接它们。 Once again, it is unnecessary (and unconventional!) in R to convert these categories into a numeric variable;再一次,在 R 中没有必要(并且非常规!)将这些类别转换为数字变量; though it may make sense to convert it to a factor via as.factor
.尽管通过as.factor
将其转换为因子可能是有意义的。
From the mapping rule to code the anx
and avo
, you actually don't need while loop, since yours is a shifted mapping from binary to decimal.从映射规则代码anx
和avo
,你其实并不需要while循环,因为你是从二进制到十进制移位映射。 In this case, you can do it like below在这种情况下,你可以像下面那样做
df <- within(df,code <- 2*anx + avo + 1)
such that以至于
> df
anx avo code
1 0 0 1
2 0 1 2
3 0 0 1
4 0 0 1
5 0 1 2
6 1 0 3
7 1 0 3
8 1 0 3
9 1 1 4
10 0 1 2
Dummy Data虚拟数据
df <- structure(list(anx = c(0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 0L
), avo = c(0L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 1L, 1L)), class = "data.frame", row.names = c(NA,
-10L))
Try this:尝试这个:
as.integer(factor(paste0(MeanAboveAvo, MeanAboveAnx)))
For example:例如:
set.seed(123)
x <- sample(0:1, 10, T) # [1] 0 0 0 1 0 1 1 1 0 0
y <- sample(0:1, 10, T) # [1] 1 1 1 0 1 0 1 0 0 0
as.integer(factor(paste0(x, y)))
# [1] 2 2 2 3 2 3 4 3 1 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.