简体   繁体   English

如何使用tidyverse堆叠多个列

[英]How to stack multiple columns using tidyverse

I have a data frame like this in wide format 我有一个宽格式的数据框

setseed(1)
df = data.frame(item=letters[1:6], field1a=sample(6,6),field1b=sample(60,6),
                field1c=sample(200,6),field2a=sample(6,6),field2b=sample(60,6),
                field2c=sample(200,6))

what would be the best way to stack all a columns together and all b together and all c together like this 什么是将所有列堆叠在一起并将所有b组合在一起的最佳方式,并将所有c组合在一起

items fielda fieldb fieldc
    a     2      52    121
    a     1      44     57

using base R: 使用基数R:

cbind(item=df$item,unstack(transform(stack(df,-1),ind=sub("\\d+","",ind))))
      item fielda fieldb fieldc
1        a      2     57    138
2        b      6     39     77
3        c      3     37    153
4        d      4      4     99
5        e      1     12    141
6        f      5     10    194
7        a      3     17     97
8        b      4     23    120
9        c      5      1     98
10       d      1     22     37
11       e      2     49    163
12       f      6     19    131

Or you can use the reshape function in Base R: 或者您可以使用Base R中的reshape函数:

reshape(df,varying = split(names(df)[-1],rep(1:3,2)),idvar = "item",direction = "long")
    item time field1a field1b field1c
a.1    a    1       2      57     138
b.1    b    1       6      39      77
c.1    c    1       3      37     153
d.1    d    1       4       4      99
e.1    e    1       1      12     141
f.1    f    1       5      10     194
a.2    a    2       3      17      97
b.2    b    2       4      23     120
c.2    c    2       5       1      98
d.2    d    2       1      22      37
e.2    e    2       2      49     163
f.2    f    2       6      19     131

You can also decide to separate the name of the dataframe by yourself then format it: 您还可以决定自己分离数据框的名称,然后对其进行格式化:

names(df)=sub("(\\d)(.)","\\2.\\1",names(df))
reshape(df,varying= -1,idvar = "item",direction = "long")

If we are using tidyverse , then gather into 'long' format, do some rearrangements with the column name and spread 如果我们使用tidyverse ,那么gather到'long'格式,用列名和spread进行一些重新排列

library(tidyverse)
out <- df %>% 
         gather(key, val, -item) %>%
         mutate(key1 = gsub("\\d+", "", key), 
                key2 = gsub("\\D+", "", key)) %>% 
         select(-key) %>%
         spread(key1, val) %>%
         select(-key2)
head(out, 2)
#   item fielda fieldb fieldc
#1    a      2     57    138
#2    a      3     17     97

Or a similar option is melt/dcast from data.table , where we melt into 'long' format, substring the 'variable' and then dcast to 'wide' format 或类似的选项是melt/dcastdata.table ,在那里我们melt成“长”格式,子串的“变量”,然后dcast到“宽”格式

library(data.table)
dcast(melt(setDT(df),  id.var = "item")[, variable := sub("\\d+", "", variable)
      ], item  + rowid(variable) ~ variable, value.var = 'value')[
        , variable := NULL][]
#     item fielda fieldb fieldc
# 1:    a      2     57    138
# 2:    a      3     17     97
# 3:    b      6     39     77
# 4:    b      4     23    120
# 5:    c      3     37    153
# 6:    c      5      1     98
# 7:    d      4      4     99
# 8:    d      1     22     37
# 9:    e      1     12    141
#10:    e      2     49    163
#11:    f      5     10    194
#12:    f      6     19    131

NOTE: Should also work when the lengths are not balanced for each cases 注意:每种情况下长度不均衡时也应该有效

data 数据

set.seed(1)
df = data.frame(item = letters[1:6], 
                field1a=sample(6,6),
                field1b=sample(60,6),
                field1c=sample(200,6),
                field2a=sample(6,6),
                field2b=sample(60,6),
                field2c=sample(200,6))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM