简体   繁体   English

将正值分隔为多列上的多行

[英]Separate positive values into multiple rows on multiple columns

Suppose I have a data set like this:假设我有一个这样的数据集:

dat <- tibble(id = 1:4, 
              col1 = c(0, 1, 1, 0),
              col2 = c(1, 0, 1, 0),
              col3 = c(1, 1, 0, 1))

> dat
# A tibble: 4 × 4
     id  col1  col2  col3
  <int> <dbl> <dbl> <dbl>
1     1     0     1     1
2     2     1     0     1
3     3     1     1     0
4     4     0     0     1

I'd like to separate, for every unique id, the multiple 1s into multiple rows, ie the expected output is:对于每个唯一的 id,我想将多个 1 分成多行,即预期的输出是:

# A tibble: 7 × 4
     id  col1  col2  col3
  <dbl> <dbl> <dbl> <dbl>
1     1     0     1     0
2     1     0     0     1
3     2     1     0     0
4     2     0     0     1
5     3     1     0     0
6     3     0     1     0
7     4     0     0     1

For the first id (id = 1), col2 and col3 are both 1, so I would like a separate row for each of them.对于第一个 id (id = 1),col2 和 col3 都是 1,所以我想为它们中的每一个设置一个单独的行。 It kinda is like one-hot encoding for rows.这有点像对行的一次性编码。

With help from Ritchie Sacramento and RobertoT在 Ritchie Sacramento 和 RobertoT 的帮助下

library(tidyverse)

dat <- tibble(id = 1:4, 
              col1 = c(0, 1, 1, 0),
              col2 = c(1, 0, 1, 0),
              col3 = c(1, 1, 0, 1))

dat %>%  
  pivot_longer(-id) %>% 
  filter(value != 0) %>% 
  mutate(rows = 1:nrow(.)) %>% 
  pivot_wider(values_fill = 0, 
              names_sort = TRUE) %>% 
  select(-rows)

# A tibble: 7 × 4
     id  col1  col2  col3
  <int> <dbl> <dbl> <dbl>
1     1     0     1     0
2     1     0     0     1
3     2     1     0     0
4     2     0     0     1
5     3     1     0     0
6     3     0     1     0
7     4     0     0     1

Here is an alternative approach using model.matrix() :这是使用model.matrix()的另一种方法:

From the documenation: model.matrix creates a design (or model) matrix, eg, by expanding factors to a set of dummy variables (depending on the contrasts) and expanding interactions similarly.从文档中: model.matrix创建一个设计(或模型)矩阵,例如,通过将因子扩展为一组虚拟变量(取决于对比)并类似地扩展交互。

library(dplyr)
library(tidyr)

dat %>% 
  pivot_longer(-id) %>% 
  filter(value == 1) %>% 
  cbind((model.matrix(~ name + 0, .) == 1)*1)
  id name value namecol1 namecol2 namecol3
1  1 col2     1        0        1        0
2  1 col3     1        0        0        1
3  2 col1     1        1        0        0
4  2 col3     1        0        0        1
5  3 col1     1        1        0        0
6  3 col2     1        0        1        0
7  4 col3     1        0        0        1

You could do你可以做

arrange(bind_rows(lapply(2:4, function(x) {
  d <- dat[dat[[x]] == 1,]
  d[-c(1, x)] <- 0
  d})), id)
#> # A tibble: 7 x 4
#>      id  col1  col2  col3
#>   <int> <dbl> <dbl> <dbl>
#> 1     1     0     1     0
#> 2     1     0     0     1
#> 3     2     1     0     0
#> 4     2     0     0     1
#> 5     3     1     0     0
#> 6     3     0     1     0
#> 7     4     0     0     1

Created on 2022-07-14 by the reprex package (v2.0.1)reprex 包于 2022-07-14 创建 (v2.0.1)

Using explicit loops:使用显式循环:

nullrow <- rep(0, ncol(dat)-1)
data <- dat[,-1]
rowsums <- apply(data, 1, sum)
res <- data[0,]
ids <- c()
for(i in 1:nrow(data)) {
  if(rowsums[i]>0) {
    for(j in 1:rowsums[i]) {
      thisrow <- nullrow
      thiscolumn <- which(data[i,]==1)[j]
      thisrow[thiscolumn] <- 1
      res <- rbind(res, thisrow)
    }
    ids <- c(ids, rep(dat$id[i], rowsums[i]))
  }  
}
names(res) <- colnames(data)
res$id <- ids
> res
  col1 col2 col3 id
1    0    1    0  1
2    0    0    1  1
3    1    0    0  2
4    0    0    1  2
5    1    0    0  3
6    0    1    0  3
7    0    0    1  4

A possible solution, based on purrr:pmap_dfr and on the following ideas:一个可能的解决方案,基于purrr:pmap_dfr和以下想法:

  1. Loop over all dataframe rows.循环遍历所有数据框行。

  2. Use each row to create a diagonal matrix with the contents of the diagonal being the dataframe row.使用每一行创建一个对角矩阵,对角线的内容是数据框行。

  3. Filter out the rows that only have zeros.过滤掉只有零的行。

library(tidyverse)

pmap_dfr(dat, ~ data.frame(id = ..1, diag(c(...)[-1]))) %>% 
  filter(if_any(X1:X3, ~ .x != 0))

#>   id X1 X2 X3
#> 1  1  0  1  0
#> 2  1  0  0  1
#> 3  2  1  0  0
#> 4  2  0  0  1
#> 5  3  1  0  0
#> 6  3  0  1  0
#> 7  4  0  0  1

Another possible solution, based on Matrix::sparseMatrix :另一种可能的解决方案,基于Matrix::sparseMatrix

  1. First, it gets the indexes where there are 1 (with which ).首先,它获取有 1 的索引( which )。
  2. Second, it adjusts the row indexes, to force one 1 per row.其次,它调整行索引,强制每行一个 1。
  3. Third, it creates a sparse matrix, putting the 1 where the adjusted indexes specify.第三,它创建一个稀疏矩阵,将 1 放在调整后的索引指定的位置。
library(tidyverse)
library(Matrix)

which(dat[-1] == 1, arr.ind = T) %>% 
  as.data.frame %>% 
  arrange(row) %>% 
  mutate(id = dat[row,"id"], row = 1:n()) %>% 
  {data.frame(id = .$id, as.matrix( sparseMatrix(i = .$row, j= .$col, x= 1)))}

#>   id X1 X2 X3
#> 1  1  0  1  0
#> 2  1  0  0  1
#> 3  2  1  0  0
#> 4  2  0  0  1
#> 5  3  1  0  0
#> 6  3  0  1  0
#> 7  4  0  0  1

Yet another possible solution:另一个可能的解决方案:

library(tidyverse)

f <- function(df)
{
  got <- 0
  
  for (i in 1:nrow(df))
  {
    got <- which.max(df[i, (got+1):ncol(df)]) + got
    df[i, -got] <- 0
  }
  
  df  
}

dat %>% 
  slice(map(1:nrow(dat), ~ rep(.x, rowSums(dat[-1])[.x])) %>% unlist) %>% 
  group_by(id) %>% 
  group_modify(~ f(.)) %>% 
  ungroup

#> # A tibble: 7 × 4
#>      id  col1  col2  col3
#>   <int> <dbl> <dbl> <dbl>
#> 1     1     0     1     0
#> 2     1     0     0     1
#> 3     2     1     0     0
#> 4     2     0     0     1
#> 5     3     1     0     0
#> 6     3     0     1     0
#> 7     4     0     0     1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM