将正值分隔为多列上的多行

Question

Suppose I have a data set like this:假设我有一个这样的数据集：

dat <- tibble(id = 1:4, 
              col1 = c(0, 1, 1, 0),
              col2 = c(1, 0, 1, 0),
              col3 = c(1, 1, 0, 1))

> dat
# A tibble: 4 × 4
     id  col1  col2  col3
  <int> <dbl> <dbl> <dbl>
1     1     0     1     1
2     2     1     0     1
3     3     1     1     0
4     4     0     0     1

I'd like to separate, for every unique id, the multiple 1s into multiple rows, ie the expected output is:对于每个唯一的 id，我想将多个 1 分成多行，即预期的输出是：

# A tibble: 7 × 4
     id  col1  col2  col3
  <dbl> <dbl> <dbl> <dbl>
1     1     0     1     0
2     1     0     0     1
3     2     1     0     0
4     2     0     0     1
5     3     1     0     0
6     3     0     1     0
7     4     0     0     1

For the first id (id = 1), col2 and col3 are both 1, so I would like a separate row for each of them.对于第一个 id (id = 1)，col2 和 col3 都是 1，所以我想为它们中的每一个设置一个单独的行。 It kinda is like one-hot encoding for rows.这有点像对行的一次性编码。

Answer 1

With help from Ritchie Sacramento and RobertoT在 Ritchie Sacramento 和 RobertoT 的帮助下

library(tidyverse)

dat <- tibble(id = 1:4, 
              col1 = c(0, 1, 1, 0),
              col2 = c(1, 0, 1, 0),
              col3 = c(1, 1, 0, 1))

dat %>%  
  pivot_longer(-id) %>% 
  filter(value != 0) %>% 
  mutate(rows = 1:nrow(.)) %>% 
  pivot_wider(values_fill = 0, 
              names_sort = TRUE) %>% 
  select(-rows)

# A tibble: 7 × 4
     id  col1  col2  col3
  <int> <dbl> <dbl> <dbl>
1     1     0     1     0
2     1     0     0     1
3     2     1     0     0
4     2     0     0     1
5     3     1     0     0
6     3     0     1     0
7     4     0     0     1

Answer 2

Here is an alternative approach using model.matrix() :这是使用model.matrix()的另一种方法：

From the documenation: model.matrix creates a design (or model) matrix, eg, by expanding factors to a set of dummy variables (depending on the contrasts) and expanding interactions similarly.从文档中： model.matrix创建一个设计（或模型）矩阵，例如，通过将因子扩展为一组虚拟变量（取决于对比）并类似地扩展交互。

library(dplyr)
library(tidyr)

dat %>% 
  pivot_longer(-id) %>% 
  filter(value == 1) %>% 
  cbind((model.matrix(~ name + 0, .) == 1)*1)

  id name value namecol1 namecol2 namecol3
1  1 col2     1        0        1        0
2  1 col3     1        0        0        1
3  2 col1     1        1        0        0
4  2 col3     1        0        0        1
5  3 col1     1        1        0        0
6  3 col2     1        0        1        0
7  4 col3     1        0        0        1

Answer 3

You could do你可以做

arrange(bind_rows(lapply(2:4, function(x) {
  d <- dat[dat[[x]] == 1,]
  d[-c(1, x)] <- 0
  d})), id)
#> # A tibble: 7 x 4
#>      id  col1  col2  col3
#>   <int> <dbl> <dbl> <dbl>
#> 1     1     0     1     0
#> 2     1     0     0     1
#> 3     2     1     0     0
#> 4     2     0     0     1
#> 5     3     1     0     0
#> 6     3     0     1     0
#> 7     4     0     0     1

^{Created on 2022-07-14 by the reprex package (v2.0.1)}^{由reprex 包于 2022-07-14 创建 (v2.0.1)}

Answer 4

Using explicit loops:使用显式循环：

nullrow <- rep(0, ncol(dat)-1)
data <- dat[,-1]
rowsums <- apply(data, 1, sum)
res <- data[0,]
ids <- c()
for(i in 1:nrow(data)) {
  if(rowsums[i]>0) {
    for(j in 1:rowsums[i]) {
      thisrow <- nullrow
      thiscolumn <- which(data[i,]==1)[j]
      thisrow[thiscolumn] <- 1
      res <- rbind(res, thisrow)
    }
    ids <- c(ids, rep(dat$id[i], rowsums[i]))
  }  
}
names(res) <- colnames(data)
res$id <- ids
> res
  col1 col2 col3 id
1    0    1    0  1
2    0    0    1  1
3    1    0    0  2
4    0    0    1  2
5    1    0    0  3
6    0    1    0  3
7    0    0    1  4

Answer 5

A possible solution, based on purrr:pmap_dfr and on the following ideas:一个可能的解决方案，基于purrr:pmap_dfr和以下想法：

Loop over all dataframe rows.循环遍历所有数据框行。
Use each row to create a diagonal matrix with the contents of the diagonal being the dataframe row.使用每一行创建一个对角矩阵，对角线的内容是数据框行。
Filter out the rows that only have zeros.过滤掉只有零的行。

library(tidyverse)

pmap_dfr(dat, ~ data.frame(id = ..1, diag(c(...)[-1]))) %>% 
  filter(if_any(X1:X3, ~ .x != 0))

#>   id X1 X2 X3
#> 1  1  0  1  0
#> 2  1  0  0  1
#> 3  2  1  0  0
#> 4  2  0  0  1
#> 5  3  1  0  0
#> 6  3  0  1  0
#> 7  4  0  0  1

Another possible solution, based on Matrix::sparseMatrix :另一种可能的解决方案，基于Matrix::sparseMatrix ：

First, it gets the indexes where there are 1 (with which ).首先，它获取有 1 的索引（ which ）。
Second, it adjusts the row indexes, to force one 1 per row.其次，它调整行索引，强制每行一个 1。
Third, it creates a sparse matrix, putting the 1 where the adjusted indexes specify.第三，它创建一个稀疏矩阵，将 1 放在调整后的索引指定的位置。

library(tidyverse)
library(Matrix)

which(dat[-1] == 1, arr.ind = T) %>% 
  as.data.frame %>% 
  arrange(row) %>% 
  mutate(id = dat[row,"id"], row = 1:n()) %>% 
  {data.frame(id = .$id, as.matrix( sparseMatrix(i = .$row, j= .$col, x= 1)))}

#>   id X1 X2 X3
#> 1  1  0  1  0
#> 2  1  0  0  1
#> 3  2  1  0  0
#> 4  2  0  0  1
#> 5  3  1  0  0
#> 6  3  0  1  0
#> 7  4  0  0  1

Yet another possible solution:另一个可能的解决方案：

library(tidyverse)

f <- function(df)
{
  got <- 0
  
  for (i in 1:nrow(df))
  {
    got <- which.max(df[i, (got+1):ncol(df)]) + got
    df[i, -got] <- 0
  }
  
  df  
}

dat %>% 
  slice(map(1:nrow(dat), ~ rep(.x, rowSums(dat[-1])[.x])) %>% unlist) %>% 
  group_by(id) %>% 
  group_modify(~ f(.)) %>% 
  ungroup

#> # A tibble: 7 × 4
#>      id  col1  col2  col3
#>   <int> <dbl> <dbl> <dbl>
#> 1     1     0     1     0
#> 2     1     0     0     1
#> 3     2     1     0     0
#> 4     2     0     0     1
#> 5     3     1     0     0
#> 6     3     0     1     0
#> 7     4     0     0     1

将正值分隔为多列上的多行

问题描述

5 个解决方案

解决方案1
5 已采纳 2022-07-14 10:50:49

解决方案2
4 2022-07-14 11:06:44

解决方案3
3 2022-07-14 10:55:03

解决方案4
1 2022-07-14 11:37:05

解决方案5
1 2022-07-14 12:36:48

将正值分隔为多列上的多行

问题描述

5 个解决方案

解决方案1 5 已采纳 2022-07-14 10:50:49

解决方案2 4 2022-07-14 11:06:44

解决方案3 3 2022-07-14 10:55:03

解决方案4 1 2022-07-14 11:37:05

解决方案5 1 2022-07-14 12:36:48

解决方案1
5 已采纳 2022-07-14 10:50:49

解决方案2
4 2022-07-14 11:06:44

解决方案3
3 2022-07-14 10:55:03

解决方案4
1 2022-07-14 11:37:05

解决方案5
1 2022-07-14 12:36:48