[英]Convert dataframe from wide format to long format with key stored in row R
I'm using tidyverse
but a base
solution is welcome, too. 我正在使用
tidyverse
但也欢迎base
解决方案。
Is there a way to, without transposing, gather
a dataframe but instead of the key
being the column names, the key
is stored in a row. 有没有一种办法,而不调换,
gather
一个数据帧,但代替key
被列名的key
是连续存放的。 For example, let's say I have a tibble called df
. 例如,假设我有一个名为
df
的tibble。
df <- tibble(a = c(5,3,5,6,2,"G1"),
b = c(5,3,5,6,2,"G1"),
c = c(8,2,6,4,1,"G2"),
d = c(8,2,6,4,1,"G2"),
e = c(9,3,7,8,4,"G3"),
f = c(9,3,7,8,4,"G3"),
g = c(6,5,2,1,8,"G4"),
h = c(6,5,2,1,8,"G4"))
df
# A tibble: 6 x 8
a b c d e f g h
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 5 5 8 8 9 9 6 6
2 3 3 2 2 3 3 5 5
3 5 5 6 6 7 7 2 2
4 6 6 4 4 8 8 1 1
5 2 2 1 1 4 4 8 8
6 G1 G1 G2 G2 G3 G3 G4 G4
The groups to group by or gather on is in the bottom row. 分组或聚集的组位于底行。 Is there a way to get
df
to have three columns only, such that the columns c, e, and g are gathered into column a, columns d, f, and h are gathered into column b and row 6 becomes column c? 有没有办法让
df
只有三列,这样列c,e和g被收集到列a中,列d,f和h被收集到列b中,第6行变成列c? The result would look like: 结果如下:
tibble(a = c(5,3,5,6,2,8,2,6,4,1,9,3,7,8,4,6,5,2,1,8),
b = c(5,3,5,6,2,8,2,6,4,1,9,3,7,8,4,6,5,2,1,8),
c = c("G1","G1","G1","G1","G1","G2","G2","G2","G2","G2",
"G3","G3","G3","G3","G3","G4","G4","G4","G4","G4"))
# A tibble: 20 x 3
a b c
<dbl> <dbl> <chr>
1 5 5 G1
2 3 3 G1
3 5 5 G1
4 6 6 G1
5 2 2 G1
6 8 8 G2
7 2 2 G2
8 6 6 G2
9 4 4 G2
10 1 1 G2
11 9 9 G3
12 3 3 G3
13 7 7 G3
14 8 8 G3
15 4 4 G3
16 6 6 G4
17 5 5 G4
18 2 2 G4
19 1 1 G4
20 8 8 G4
I would like to avoid transposing because I need the row and column orders preserved until everything is properly labeled. 我想避免转置,因为我需要保留行和列顺序,直到所有内容都被正确标记。
Here is one idea. 这是一个想法。
library(tidyverse)
df2 <- df %>%
t() %>%
as.data.frame(stringsAsFactors = FALSE) %>%
split(f = .$V6) %>%
map_dfr(~.x %>%
select(-V6) %>%
t() %>%
as.data.frame(stringsAsFactors = FALSE) %>%
setNames(c("a", "b")),
.id = "c") %>%
select(a, b, c) %>%
mutate_at(vars(-c), list(~as.numeric(.)))
df2
# a b c
# 1 5 5 G1
# 2 3 3 G1
# 3 5 5 G1
# 4 6 6 G1
# 5 2 2 G1
# 6 8 8 G2
# 7 2 2 G2
# 8 6 6 G2
# 9 4 4 G2
# 10 1 1 G2
# 11 9 9 G3
# 12 3 3 G3
# 13 7 7 G3
# 14 8 8 G3
# 15 4 4 G3
# 16 6 6 G4
# 17 5 5 G4
# 18 2 2 G4
# 19 1 1 G4
# 20 8 8 G4
Here is one implementation. 这是一个实现。 We can
split
the tibble into a list
of tibble based on the last row, loop through the list
with imap
, rename
the colums to same column names ('a', 'b'), mutate
to create the column 'c' with the list
name and bind the rows 我们可以
split
的tibble成一个list
基于最后一排,依次通过上tibble的list
与imap
, rename
的colums以相同的列名(“A”,“B”), mutate
创造了列“C”与list
名称并绑定行
library(tidyverse)
df %>%
slice(-n()) %>%
split.default(df %>%
slice(n()) %>%
flatten_chr) %>%
imap_dfr(~ .x %>%
rename_all(~ c('a', 'b')) %>%
mutate(c = .y))
# A tibble: 20 x 3
# a b c
# <chr> <chr> <chr>
# 1 5 5 G1
# 2 3 3 G1
# 3 5 5 G1
# 4 6 6 G1
# 5 2 2 G1
# 6 8 8 G2
# 7 2 2 G2
# 8 6 6 G2
# 9 4 4 G2
#10 1 1 G2
#11 9 9 G3
#12 3 3 G3
#13 7 7 G3
#14 8 8 G3
#15 4 4 G3
#16 6 6 G4
#17 5 5 G4
#18 2 2 G4
#19 1 1 G4
#20 8 8 G4
Transposing probably doesn't hurt if you do it step by step. 如果你一步一步地进行转置可能不会受到影响。 In this base R solution, row and column information is kept until the last line.
在此基本R解决方案中,行和列信息保持到最后一行。
d <- data.frame(t(as.matrix(df)))
l <- lapply(split(d[-6], d$X6), t)
res <- do.call(rbind, Map(cbind, l, c=names(l)))
res <- setNames(data.frame(res, row.names=NULL), letters[1:3])
res
# a b c
# 1 5 5 G1
# 2 3 3 G1
# 3 5 5 G1
# 4 6 6 G1
# 5 2 2 G1
# 6 8 8 G2
# 7 2 2 G2
# 8 6 6 G2
# 9 4 4 G2
# 10 1 1 G2
# 11 9 9 G3
# 12 3 3 G3
# 13 7 7 G3
# 14 8 8 G3
# 15 4 4 G3
# 16 6 6 G4
# 17 5 5 G4
# 18 2 2 G4
# 19 1 1 G4
# 20 8 8 G4
One option with data.table data.table的一个选项
First, since we're not using the original names, replace them. 首先,因为我们没有使用原始名称,所以请替换它们。 Also remove the last row and convert everthing to integer.
同时删除最后一行并将everthing转换为整数。
library(data.table)
setDT(df)
df <- df[-.N]
df[, names(df) := lapply(.SD, as.integer)]
setnames(df, rep_len(c('a', 'b'), ncol(df)))
# a b a b a b a b
# 1: 5 5 8 8 9 9 6 6
# 2: 3 3 2 2 3 3 5 5
# 3: 5 5 6 6 7 7 2 2
# 4: 6 6 4 4 8 8 1 1
# 5: 2 2 1 1 4 4 8 8
Now melt
on the row number, add the G[1-4] column, and dcast melted df to wide form. 现在在行号上
melt
,添加G [1-4]列,并将dcast熔化df变为宽幅。
df[, rid := 1:.N]
df2 <- melt(df, 'rid')
df2[, c := paste0('G', rowid(rid, variable))]
dcast(df2, rid + c ~ variable)[order(c), -'rid']
# c a b
# 1: G1 5 5
# 2: G1 3 3
# 3: G1 5 5
# 4: G1 6 6
# 5: G1 2 2
# 6: G2 8 8
# 7: G2 2 2
# 8: G2 6 6
# 9: G2 4 4
# 10: G2 1 1
# 11: G3 9 9
# 12: G3 3 3
# 13: G3 7 7
# 14: G3 8 8
# 15: G3 4 4
# 16: G4 6 6
# 17: G4 5 5
# 18: G4 2 2
# 19: G4 1 1
# 20: G4 8 8
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.