简体   繁体   English

使用存储在行R中的密钥将数据帧从宽格式转换为长格式

[英]Convert dataframe from wide format to long format with key stored in row R

I'm using tidyverse but a base solution is welcome, too. 我正在使用tidyverse但也欢迎base解决方案。

Is there a way to, without transposing, gather a dataframe but instead of the key being the column names, the key is stored in a row. 有没有一种办法,而不调换, gather一个数据帧,但代替key被列名的key是连续存放的。 For example, let's say I have a tibble called df . 例如,假设我有一个名为df的tibble。

df <- tibble(a = c(5,3,5,6,2,"G1"),
             b = c(5,3,5,6,2,"G1"),
             c = c(8,2,6,4,1,"G2"),
             d = c(8,2,6,4,1,"G2"),
             e = c(9,3,7,8,4,"G3"),
             f = c(9,3,7,8,4,"G3"),
             g = c(6,5,2,1,8,"G4"),
             h = c(6,5,2,1,8,"G4"))
df
# A tibble: 6 x 8
  a     b     c     d     e     f     g     h    
  <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 5     5     8     8     9     9     6     6    
2 3     3     2     2     3     3     5     5    
3 5     5     6     6     7     7     2     2    
4 6     6     4     4     8     8     1     1    
5 2     2     1     1     4     4     8     8    
6 G1    G1    G2    G2    G3    G3    G4    G4 

The groups to group by or gather on is in the bottom row. 分组或聚集的组位于底行。 Is there a way to get df to have three columns only, such that the columns c, e, and g are gathered into column a, columns d, f, and h are gathered into column b and row 6 becomes column c? 有没有办法让df只有三列,这样列c,e和g被收集到列a中,列d,f和h被收集到列b中,第6行变成列c? The result would look like: 结果如下:

tibble(a = c(5,3,5,6,2,8,2,6,4,1,9,3,7,8,4,6,5,2,1,8),
       b = c(5,3,5,6,2,8,2,6,4,1,9,3,7,8,4,6,5,2,1,8),
       c = c("G1","G1","G1","G1","G1","G2","G2","G2","G2","G2",
             "G3","G3","G3","G3","G3","G4","G4","G4","G4","G4"))
# A tibble: 20 x 3
       a     b c    
   <dbl> <dbl> <chr>
 1     5     5 G1   
 2     3     3 G1   
 3     5     5 G1   
 4     6     6 G1   
 5     2     2 G1   
 6     8     8 G2   
 7     2     2 G2   
 8     6     6 G2   
 9     4     4 G2   
10     1     1 G2   
11     9     9 G3   
12     3     3 G3   
13     7     7 G3   
14     8     8 G3   
15     4     4 G3   
16     6     6 G4   
17     5     5 G4   
18     2     2 G4   
19     1     1 G4   
20     8     8 G4 

I would like to avoid transposing because I need the row and column orders preserved until everything is properly labeled. 我想避免转置,因为我需要保留行和列顺序,直到所有内容都被正确标记。

Here is one idea. 这是一个想法。

library(tidyverse)

df2 <- df %>%
  t() %>%
  as.data.frame(stringsAsFactors = FALSE) %>%
  split(f = .$V6) %>%
  map_dfr(~.x %>% 
            select(-V6) %>%
            t() %>%
            as.data.frame(stringsAsFactors = FALSE) %>%
            setNames(c("a", "b")),
          .id = "c") %>%
  select(a, b, c) %>%
  mutate_at(vars(-c), list(~as.numeric(.)))

df2
#    a b  c
# 1  5 5 G1
# 2  3 3 G1
# 3  5 5 G1
# 4  6 6 G1
# 5  2 2 G1
# 6  8 8 G2
# 7  2 2 G2
# 8  6 6 G2
# 9  4 4 G2
# 10 1 1 G2
# 11 9 9 G3
# 12 3 3 G3
# 13 7 7 G3
# 14 8 8 G3
# 15 4 4 G3
# 16 6 6 G4
# 17 5 5 G4
# 18 2 2 G4
# 19 1 1 G4
# 20 8 8 G4

Here is one implementation. 这是一个实现。 We can split the tibble into a list of tibble based on the last row, loop through the list with imap , rename the colums to same column names ('a', 'b'), mutate to create the column 'c' with the list name and bind the rows 我们可以split的tibble成一个list基于最后一排,依次通过上tibble的listimaprename的colums以相同的列名(“A”,“B”), mutate创造了列“C”与list名称并绑定行

library(tidyverse)
df %>% 
   slice(-n()) %>%
   split.default(df %>% 
                    slice(n())  %>% 
                    flatten_chr) %>%
     imap_dfr(~ .x %>% 
               rename_all(~ c('a', 'b')) %>%
     mutate(c = .y))
# A tibble: 20 x 3
#   a     b     c    
#   <chr> <chr> <chr>
# 1 5     5     G1   
# 2 3     3     G1   
# 3 5     5     G1   
# 4 6     6     G1   
# 5 2     2     G1   
# 6 8     8     G2   
# 7 2     2     G2   
# 8 6     6     G2   
# 9 4     4     G2   
#10 1     1     G2   
#11 9     9     G3   
#12 3     3     G3   
#13 7     7     G3   
#14 8     8     G3   
#15 4     4     G3   
#16 6     6     G4   
#17 5     5     G4   
#18 2     2     G4   
#19 1     1     G4   
#20 8     8     G4  

Transposing probably doesn't hurt if you do it step by step. 如果你一步一步地进行转置可能不会受到影响。 In this base R solution, row and column information is kept until the last line. 在此基本R解决方案中,行和列信息保持到最后一行。

d <- data.frame(t(as.matrix(df)))
l <- lapply(split(d[-6], d$X6), t)
res <- do.call(rbind, Map(cbind, l, c=names(l)))
res <- setNames(data.frame(res, row.names=NULL), letters[1:3])
res
#    a b  c
# 1  5 5 G1
# 2  3 3 G1
# 3  5 5 G1
# 4  6 6 G1
# 5  2 2 G1
# 6  8 8 G2
# 7  2 2 G2
# 8  6 6 G2
# 9  4 4 G2
# 10 1 1 G2
# 11 9 9 G3
# 12 3 3 G3
# 13 7 7 G3
# 14 8 8 G3
# 15 4 4 G3
# 16 6 6 G4
# 17 5 5 G4
# 18 2 2 G4
# 19 1 1 G4
# 20 8 8 G4

One option with data.table data.table的一个选项

First, since we're not using the original names, replace them. 首先,因为我们没有使用原始名称,所以请替换它们。 Also remove the last row and convert everthing to integer. 同时删除最后一行并将everthing转换为整数。

library(data.table)
setDT(df)

df <- df[-.N]
df[, names(df) := lapply(.SD, as.integer)]
setnames(df, rep_len(c('a', 'b'), ncol(df)))

#    a b a b a b a b
# 1: 5 5 8 8 9 9 6 6
# 2: 3 3 2 2 3 3 5 5
# 3: 5 5 6 6 7 7 2 2
# 4: 6 6 4 4 8 8 1 1
# 5: 2 2 1 1 4 4 8 8

Now melt on the row number, add the G[1-4] column, and dcast melted df to wide form. 现在在行号上melt ,添加G [1-4]列,并将dcast熔化df变为宽幅。

df[, rid := 1:.N]
df2 <- melt(df, 'rid')
df2[, c := paste0('G', rowid(rid, variable))]
dcast(df2, rid + c ~ variable)[order(c), -'rid']

#      c a b
#  1: G1 5 5
#  2: G1 3 3
#  3: G1 5 5
#  4: G1 6 6
#  5: G1 2 2
#  6: G2 8 8
#  7: G2 2 2
#  8: G2 6 6
#  9: G2 4 4
# 10: G2 1 1
# 11: G3 9 9
# 12: G3 3 3
# 13: G3 7 7
# 14: G3 8 8
# 15: G3 4 4
# 16: G4 6 6
# 17: G4 5 5
# 18: G4 2 2
# 19: G4 1 1
# 20: G4 8 8 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM