[英]How to construct dummy matrix with a list of data
The sample data is like this:样本数据是这样的:
data1:数据1:
x1 x1 | x2 x2 | x3 x3 | x4 x4 |
---|---|---|---|
1 1个 | 2 2个 | 3 3个 | 4 4个 |
2 2个 | 3 3个 | -1 -1 | -1 -1 |
NA北美 | NA北美 | NA北美 | NA北美 |
0 0 | 0 0 | 0 0 | 0 0 |
1 1个 | -1 -1 | -1 -1 | -1 -1 |
NA北美 | NA北美 | NA北美 | NA北美 |
4 4个 | 3 3个 | -1 -1 | -1 -1 |
0 0 | 0 0 | 0 0 | 0 0 |
data1[,1]
means that data1[,1]
belongs to group x1,x2,x3,x4
. data1[,1]
表示data1[,1]
属于组x1,x2,x3,x4
。
-1
means that there is a blank. -1
表示有空格。 0
means that the data does not belong to the corresponding group(ie if 0
is in x1
, which means the datum does not belong to group 1
.) 0
表示数据不属于相应的组(即如果0
在x1
中,则表示数据不属于组1
。)
NA
means missing data, where NA
will randomly appear in the dataset. NA
表示缺失数据,其中NA
将随机出现在数据集中。
Edit: For example, in 1st row, [1,2,3,4]
means the first, second, third, and fourth columns.编辑:例如,在第一行中, [1,2,3,4]
表示第一、第二、第三和第四列。 Therefore, in the 1st row of data2, the row will be [1,1,1,1]
.因此,在 data2 的第一行中,该行将为[1,1,1,1]
。
In 1st row, [2,3,-1,-1]
means the second and third columns, -1
means that there is a blank.第一行, [2,3,-1,-1]
表示第二列和第三列, -1
表示有空格。 Therefore, in the 1st row of data2, the row will be [0,1,1,0]
.因此,在 data2 的第一行中,该行将为[0,1,1,0]
。
My expected outcome is:我的预期结果是:
data2:数据2:
x1 x1 | x2 x2 | x3 x3 | x4 x4 |
---|---|---|---|
1 1个 | 1 1个 | 1 1个 | 1 1个 |
0 0 | 1 1个 | 1 1个 | 0 0 |
NA北美 | NA北美 | NA北美 | NA北美 |
0 0 | 0 0 | 0 0 | 0 0 |
1 1个 | 0 0 | 0 0 | 0 0 |
NA北美 | NA北美 | NA北美 | NA北美 |
0 0 | 0 0 | 1 1个 | 1 1个 |
0 0 | 0 0 | 0 0 | 0 0 |
My code is as below:我的代码如下:
for (i in 1:8){
if(data1$x1[i] %in% c(0)) {
data1[i,] = as.list(rep(0,4))
}
else if(is.na(data1$x1[i]))
{data1[i,] = as.list(rep(NA,4))
}}
for (i in which(data1$x1 %nin% c(NA,0))){
for (j in 1:4){
if (data1[i,j]<15 & data1[i,j]>0){
data1[i,j] = m
data1[i,m] = 1
}
}
}
#replace -1 to 0
data1[data1== -1] = 0
#This for loop creates dummy matrix
for (i in which(data1$x1%nin%c(NA,0))){
m = data1[i,]
m = m[m>0]
for(j in 1:length(m)){
data1[i,m] = 1
}
}
#replace the number that greater than zero to zero
data1[data1>1] = 0
I wonder if there is any function can be used to replace forloop.不知道有没有function可以用来代替forloop的。 Please give me some suggestion, thank you!请给我一些建议,谢谢!
I am still not entirely sure of logic, but this might be helpful.我仍然不完全确定逻辑,但这可能会有所帮助。 Using apply
you can evaluate each row independently.使用apply
您可以独立评估每一行。
First, create a vector of NA
.首先,创建NA
的向量。 Then, where a value is greater than 1, set that element in the vector (column number) to 1.然后,如果某个值大于 1,则将该向量(列号)中的元素设置为 1。
Second, if the vector has at least one 1 value, then change the others missing to 0.其次,如果向量至少有一个为 1 的值,则将其他缺失的值更改为 0。
Third, if all elements are zero and no values are missing, then make all values in that row 0.第三,如果所有元素都为零且没有值缺失,则将该行中的所有值设为 0。
The end result is a matrix in this example.在这个例子中,最终结果是一个矩阵。
t(apply(
data1,
MARGIN = 1,
\(x) {
vec <- rep(NA, length(x))
vec[x[x > 0]] <- 1
if (any(vec == 1, na.rm = T)) vec[is.na(vec)] <- 0
if (any(!is.na(x)) & all(x == 0)) vec <- rep(0, length(x))
vec
}
))
Output Output
[,1] [,2] [,3] [,4]
[1,] 1 1 1 1
[2,] 0 1 1 0
[3,] NA NA NA NA
[4,] 0 0 0 0
[5,] 1 0 0 0
[6,] NA NA NA NA
[7,] 0 0 1 1
[8,] 0 0 0 0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.