简体   繁体   English

如何用数据列表构造虚拟矩阵

[英]How to construct dummy matrix with a list of data

The sample data is like this:样本数据是这样的:

data1:数据1:

x1 x1 x2 x2 x3 x3 x4 x4
1 1个 2 2个 3 3个 4 4个
2 2个 3 3个 -1 -1 -1 -1
NA北美 NA北美 NA北美 NA北美
0 0 0 0 0 0 0 0
1 1个 -1 -1 -1 -1 -1 -1
NA北美 NA北美 NA北美 NA北美
4 4个 3 3个 -1 -1 -1 -1
0 0 0 0 0 0 0 0

data1[,1] means that data1[,1] belongs to group x1,x2,x3,x4 . data1[,1]表示data1[,1]属于组x1,x2,x3,x4
-1 means that there is a blank. -1表示有空格。 0 means that the data does not belong to the corresponding group(ie if 0 is in x1 , which means the datum does not belong to group 1 .) 0表示数据不属于相应的组(即如果0x1中,则表示数据不属于组1 。)
NA means missing data, where NA will randomly appear in the dataset. NA表示缺失数据,其中NA将随机出现在数据集中。

Edit: For example, in 1st row, [1,2,3,4] means the first, second, third, and fourth columns.编辑:例如,在第一行中, [1,2,3,4]表示第一、第二、第三和第四列。 Therefore, in the 1st row of data2, the row will be [1,1,1,1] .因此,在 data2 的第一行中,该行将为[1,1,1,1]

In 1st row, [2,3,-1,-1] means the second and third columns, -1 means that there is a blank.第一行, [2,3,-1,-1]表示第二列和第三列, -1表示有空格。 Therefore, in the 1st row of data2, the row will be [0,1,1,0] .因此,在 data2 的第一行中,该行将为[0,1,1,0]

My expected outcome is:我的预期结果是:

data2:数据2:

x1 x1 x2 x2 x3 x3 x4 x4
1 1个 1 1个 1 1个 1 1个
0 0 1 1个 1 1个 0 0
NA北美 NA北美 NA北美 NA北美
0 0 0 0 0 0 0 0
1 1个 0 0 0 0 0 0
NA北美 NA北美 NA北美 NA北美
0 0 0 0 1 1个 1 1个
0 0 0 0 0 0 0 0

My code is as below:我的代码如下:

for (i in 1:8){
if(data1$x1[i] %in% c(0)) {
  data1[i,] = as.list(rep(0,4))
}
else if(is.na(data1$x1[i]))
  {data1[i,] = as.list(rep(NA,4))
}}


for (i in which(data1$x1 %nin% c(NA,0))){
  for (j in 1:4){
  if (data1[i,j]<15 & data1[i,j]>0){
      data1[i,j] =  m
      data1[i,m] = 1
    }
  }
}

#replace -1 to 0
data1[data1== -1] = 0

#This for loop creates dummy matrix

for (i in which(data1$x1%nin%c(NA,0))){
  m = data1[i,] 
  m = m[m>0] 
  for(j in 1:length(m)){
    data1[i,m] = 1
  }
}

#replace the number that greater than zero to zero
data1[data1>1] = 0

I wonder if there is any function can be used to replace forloop.不知道有没有function可以用来代替forloop的。 Please give me some suggestion, thank you!请给我一些建议,谢谢!

I am still not entirely sure of logic, but this might be helpful.我仍然不完全确定逻辑,但这可能会有所帮助。 Using apply you can evaluate each row independently.使用apply您可以独立评估每一行。

First, create a vector of NA .首先,创建NA的向量。 Then, where a value is greater than 1, set that element in the vector (column number) to 1.然后,如果某个值大于 1,则将该向量(列号)中的元素设置为 1。

Second, if the vector has at least one 1 value, then change the others missing to 0.其次,如果向量至少有一个为 1 的值,则将其他缺失的值更改为 0。

Third, if all elements are zero and no values are missing, then make all values in that row 0.第三,如果所有元素都为零且没有值缺失,则将该行中的所有值设为 0。

The end result is a matrix in this example.在这个例子中,最终结果是一个矩阵。

t(apply(
  data1,
  MARGIN = 1,
  \(x) {
    vec <- rep(NA, length(x))
    vec[x[x > 0]] <- 1
    if (any(vec == 1, na.rm = T)) vec[is.na(vec)] <- 0
    if (any(!is.na(x)) & all(x == 0)) vec <- rep(0, length(x))
    vec
  }
))

Output Output

     [,1] [,2] [,3] [,4]
[1,]    1    1    1    1
[2,]    0    1    1    0
[3,]   NA   NA   NA   NA
[4,]    0    0    0    0
[5,]    1    0    0    0
[6,]   NA   NA   NA   NA
[7,]    0    0    1    1
[8,]    0    0    0    0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM