[英]Separate positive values into multiple rows on multiple columns
Suppose I have a data set like this:假设我有一个这样的数据集:
dat <- tibble(id = 1:4,
col1 = c(0, 1, 1, 0),
col2 = c(1, 0, 1, 0),
col3 = c(1, 1, 0, 1))
> dat
# A tibble: 4 × 4
id col1 col2 col3
<int> <dbl> <dbl> <dbl>
1 1 0 1 1
2 2 1 0 1
3 3 1 1 0
4 4 0 0 1
I'd like to separate, for every unique id, the multiple 1s into multiple rows, ie the expected output is:对于每个唯一的 id,我想将多个 1 分成多行,即预期的输出是:
# A tibble: 7 × 4
id col1 col2 col3
<dbl> <dbl> <dbl> <dbl>
1 1 0 1 0
2 1 0 0 1
3 2 1 0 0
4 2 0 0 1
5 3 1 0 0
6 3 0 1 0
7 4 0 0 1
For the first id (id = 1), col2 and col3 are both 1, so I would like a separate row for each of them.对于第一个 id (id = 1),col2 和 col3 都是 1,所以我想为它们中的每一个设置一个单独的行。 It kinda is like one-hot encoding for rows.
这有点像对行的一次性编码。
With help from Ritchie Sacramento and RobertoT在 Ritchie Sacramento 和 RobertoT 的帮助下
library(tidyverse)
dat <- tibble(id = 1:4,
col1 = c(0, 1, 1, 0),
col2 = c(1, 0, 1, 0),
col3 = c(1, 1, 0, 1))
dat %>%
pivot_longer(-id) %>%
filter(value != 0) %>%
mutate(rows = 1:nrow(.)) %>%
pivot_wider(values_fill = 0,
names_sort = TRUE) %>%
select(-rows)
# A tibble: 7 × 4
id col1 col2 col3
<int> <dbl> <dbl> <dbl>
1 1 0 1 0
2 1 0 0 1
3 2 1 0 0
4 2 0 0 1
5 3 1 0 0
6 3 0 1 0
7 4 0 0 1
Here is an alternative approach using model.matrix()
:这是使用
model.matrix()
的另一种方法:
From the documenation: model.matrix
creates a design (or model) matrix, eg, by expanding factors to a set of dummy variables (depending on the contrasts) and expanding interactions similarly.从文档中:
model.matrix
创建一个设计(或模型)矩阵,例如,通过将因子扩展为一组虚拟变量(取决于对比)并类似地扩展交互。
library(dplyr)
library(tidyr)
dat %>%
pivot_longer(-id) %>%
filter(value == 1) %>%
cbind((model.matrix(~ name + 0, .) == 1)*1)
id name value namecol1 namecol2 namecol3
1 1 col2 1 0 1 0
2 1 col3 1 0 0 1
3 2 col1 1 1 0 0
4 2 col3 1 0 0 1
5 3 col1 1 1 0 0
6 3 col2 1 0 1 0
7 4 col3 1 0 0 1
You could do你可以做
arrange(bind_rows(lapply(2:4, function(x) {
d <- dat[dat[[x]] == 1,]
d[-c(1, x)] <- 0
d})), id)
#> # A tibble: 7 x 4
#> id col1 col2 col3
#> <int> <dbl> <dbl> <dbl>
#> 1 1 0 1 0
#> 2 1 0 0 1
#> 3 2 1 0 0
#> 4 2 0 0 1
#> 5 3 1 0 0
#> 6 3 0 1 0
#> 7 4 0 0 1
Created on 2022-07-14 by the reprex package (v2.0.1)由reprex 包于 2022-07-14 创建 (v2.0.1)
Using explicit loops:使用显式循环:
nullrow <- rep(0, ncol(dat)-1)
data <- dat[,-1]
rowsums <- apply(data, 1, sum)
res <- data[0,]
ids <- c()
for(i in 1:nrow(data)) {
if(rowsums[i]>0) {
for(j in 1:rowsums[i]) {
thisrow <- nullrow
thiscolumn <- which(data[i,]==1)[j]
thisrow[thiscolumn] <- 1
res <- rbind(res, thisrow)
}
ids <- c(ids, rep(dat$id[i], rowsums[i]))
}
}
names(res) <- colnames(data)
res$id <- ids
> res
col1 col2 col3 id
1 0 1 0 1
2 0 0 1 1
3 1 0 0 2
4 0 0 1 2
5 1 0 0 3
6 0 1 0 3
7 0 0 1 4
A possible solution, based on purrr:pmap_dfr
and on the following ideas:一个可能的解决方案,基于
purrr:pmap_dfr
和以下想法:
Loop over all dataframe rows.循环遍历所有数据框行。
Use each row to create a diagonal matrix with the contents of the diagonal being the dataframe row.使用每一行创建一个对角矩阵,对角线的内容是数据框行。
Filter out the rows that only have zeros.过滤掉只有零的行。
library(tidyverse)
pmap_dfr(dat, ~ data.frame(id = ..1, diag(c(...)[-1]))) %>%
filter(if_any(X1:X3, ~ .x != 0))
#> id X1 X2 X3
#> 1 1 0 1 0
#> 2 1 0 0 1
#> 3 2 1 0 0
#> 4 2 0 0 1
#> 5 3 1 0 0
#> 6 3 0 1 0
#> 7 4 0 0 1
Another possible solution, based on Matrix::sparseMatrix
:另一种可能的解决方案,基于
Matrix::sparseMatrix
:
which
).which
)。library(tidyverse)
library(Matrix)
which(dat[-1] == 1, arr.ind = T) %>%
as.data.frame %>%
arrange(row) %>%
mutate(id = dat[row,"id"], row = 1:n()) %>%
{data.frame(id = .$id, as.matrix( sparseMatrix(i = .$row, j= .$col, x= 1)))}
#> id X1 X2 X3
#> 1 1 0 1 0
#> 2 1 0 0 1
#> 3 2 1 0 0
#> 4 2 0 0 1
#> 5 3 1 0 0
#> 6 3 0 1 0
#> 7 4 0 0 1
Yet another possible solution:另一个可能的解决方案:
library(tidyverse)
f <- function(df)
{
got <- 0
for (i in 1:nrow(df))
{
got <- which.max(df[i, (got+1):ncol(df)]) + got
df[i, -got] <- 0
}
df
}
dat %>%
slice(map(1:nrow(dat), ~ rep(.x, rowSums(dat[-1])[.x])) %>% unlist) %>%
group_by(id) %>%
group_modify(~ f(.)) %>%
ungroup
#> # A tibble: 7 × 4
#> id col1 col2 col3
#> <int> <dbl> <dbl> <dbl>
#> 1 1 0 1 0
#> 2 1 0 0 1
#> 3 2 1 0 0
#> 4 2 0 0 1
#> 5 3 1 0 0
#> 6 3 0 1 0
#> 7 4 0 0 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.