如何将数据从长格式重塑为宽格式

Question

我在重新排列以下数据框时遇到问题：

set.seed(45)
dat1 <- data.frame(
    name = rep(c("firstName", "secondName"), each=4),
    numbers = rep(1:4, 2),
    value = rnorm(8)
    )

dat1
       name  numbers      value
1  firstName       1  0.3407997
2  firstName       2 -0.7033403
3  firstName       3 -0.3795377
4  firstName       4 -0.7460474
5 secondName       1 -0.8981073
6 secondName       2 -0.3347941
7 secondName       3 -0.5013782
8 secondName       4 -0.1745357

我想重塑它，使每个唯一的“名称”变量都是一个行名，“值”作为该行的观察结果，“数字”作为列名。 有点像这样：

     name          1          2          3         4
1  firstName  0.3407997 -0.7033403 -0.3795377 -0.7460474
5 secondName -0.8981073 -0.3347941 -0.5013782 -0.1745357

我看过melt和cast以及其他一些东西，但似乎没有一个能完成这项工作。

Answer 1

使用reshape function：

reshape(dat1, idvar = "name", timevar = "numbers", direction = "wide")

Answer 2

新的（2014 年） tidyr package 也可以简单地做到这一点， gather() / spread()是melt / cast的术语。

编辑：现在，在 2019 年，tidyr v 1.0 已经启动，并在弃用路径上设置了spread和gather ，而是更喜欢pivot_wider和pivot_longer ，您可以在这个答案中找到描述。 如果您想简要了解spread/gather的短暂生命，请继续阅读。

library(tidyr)
spread(dat1, key = numbers, value = value)

从github ，

tidyr是reshape2的重构，旨在配合 tidy 数据框架，并与magrittr和dplyr ，构建可靠的数据分析管道。

就像reshape2做的比 reshape 少一样， tidyr做的比reshape2少。 它是专门为整理数据而设计的，而不是reshape2所做的一般重塑或 reshape 所做的一般聚合。 特别是，内置方法仅适用于数据帧，而tidyr不提供边距或聚合。

Answer 3

您可以使用reshape() function 或使用 reshape package 中的melt() / cast()函数来执行此操作。 对于第二个选项，示例代码是

library(reshape)
cast(dat1, name ~ numbers)

或使用reshape2

library(reshape2)
dcast(dat1, name ~ numbers)

Answer 4

如果性能是一个问题，另一种选择是使用data.table的reshape2的 melt & dcast 功能的扩展

（参考：使用 data.tables 进行高效整形）

library(data.table)

setDT(dat1)
dcast(dat1, name ~ numbers, value.var = "value")

#          name          1          2         3         4
# 1:  firstName  0.1836433 -0.8356286 1.5952808 0.3295078
# 2: secondName -0.8204684  0.4874291 0.7383247 0.5757814

而且，从 data.table v1.9.6 开始，我们可以在多个列上进行转换

## add an extra column
dat1[, value2 := value * 2]

## cast multiple value columns
dcast(dat1, name ~ numbers, value.var = c("value", "value2"))

#          name    value_1    value_2   value_3   value_4   value2_1   value2_2 value2_3  value2_4
# 1:  firstName  0.1836433 -0.8356286 1.5952808 0.3295078  0.3672866 -1.6712572 3.190562 0.6590155
# 2: secondName -0.8204684  0.4874291 0.7383247 0.5757814 -1.6409368  0.9748581 1.476649 1.1515627

Answer 5

使用tidyr '0.8.3.9000'的开发版本，有pivot_wider和pivot_longer可以进行从 1 到多列的重塑（分别为长 -> 宽，宽 -> 长）。 使用 OP 的数据

- 单列长 -> 宽

library(dplyr)
library(tidyr)
dat1 %>% 
    pivot_wider(names_from = numbers, values_from = value)
# A tibble: 2 x 5
#  name          `1`    `2`    `3`    `4`
#  <fct>       <dbl>  <dbl>  <dbl>  <dbl>
#1 firstName   0.341 -0.703 -0.380 -0.746
#2 secondName -0.898 -0.335 -0.501 -0.175

-> 创建了另一个列来显示功能

dat1 %>% 
    mutate(value2 = value * 2) %>% 
    pivot_wider(names_from = numbers, values_from = c("value", "value2"))
# A tibble: 2 x 9
#  name       value_1 value_2 value_3 value_4 value2_1 value2_2 value2_3 value2_4
#  <fct>        <dbl>   <dbl>   <dbl>   <dbl>    <dbl>    <dbl>    <dbl>    <dbl>
#1 firstName    0.341  -0.703  -0.380  -0.746    0.682   -1.41    -0.759   -1.49 
#2 secondName  -0.898  -0.335  -0.501  -0.175   -1.80    -0.670   -1.00    -0.349

Answer 6

使用您的示例 dataframe，我们可以：

xtabs(value ~ name + numbers, data = dat1)

Answer 7

其他两个选项：

底座 package：

df <- unstack(dat1, form = value ~ numbers)
rownames(df) <- unique(dat1$name)
df

sqldf package：

library(sqldf)
sqldf('SELECT name,
      MAX(CASE WHEN numbers = 1 THEN value ELSE NULL END) x1, 
      MAX(CASE WHEN numbers = 2 THEN value ELSE NULL END) x2,
      MAX(CASE WHEN numbers = 3 THEN value ELSE NULL END) x3,
      MAX(CASE WHEN numbers = 4 THEN value ELSE NULL END) x4
      FROM dat1
      GROUP BY name')

Answer 8

使用基础 R aggregate function：

aggregate(value ~ name, dat1, I)

# name           value.1  value.2  value.3  value.4
#1 firstName      0.4145  -0.4747   0.0659   -0.5024
#2 secondName    -0.8259   0.1669  -0.8962    0.1681

Answer 9

基础reshape function 工作得非常好：

df <- data.frame(
  year   = c(rep(2000, 12), rep(2001, 12)),
  month  = rep(1:12, 2),
  values = rnorm(24)
)
df_wide <- reshape(df, idvar="year", timevar="month", v.names="values", direction="wide", sep="_")
df_wide

在哪里

idvar是分隔行的类的列
timevar是要广泛转换的类的列
v.names是包含数值的列
direction指定宽或长格式
可选的sep参数是在 output data.frame 中的timevar data.frame名称和v.names之间使用的分隔符。

如果不存在idvar ，请在使用reshape() function 之前创建一个：

df$id   <- c(rep("year1", 12), rep("year2", 12))
df_wide <- reshape(df, idvar="id", timevar="month", v.names="values", direction="wide", sep="_")
df_wide

请记住， idvar是必需的！ timevar和v.names部分很简单。 这个 function 的 output 比其他一些更可预测，因为一切都是明确定义的。

Answer 10

Win-Vector（制作vtreat 、 seplyr和replyr的人）的天才数据科学家提供了非常强大的新 package ，称为cdata 。 它实现了本文档和本博文中描述的“协调数据”原则。 这个想法是，无论您如何组织数据，都应该可以使用“数据坐标”系统识别各个数据点。 以下是 John Mount 最近博客文章的摘录：

整个系统基于两个原语或运算符 cdata::moveValuesToRowsD() 和 cdata::moveValuesToColumnsD()。 这些运算符具有 pivot、un-pivot、one-hot encode、转置、移动多行和多列以及许多其他转换作为简单的特殊情况。

根据 cdata 原语编写许多不同的操作很容易。 这些运算符可以在 memory 或大数据规模下工作（使用数据库和 Apache Spark；对于大数据，使用 cdata::moveValuesToRowsN() 和 cdata::moveValuesToColumnsN() 变体）。 转换由一个控制表控制，该控制表本身就是转换的图表（或图片）。

我们将首先构建控制表（有关详细信息，请参阅博客文章），然后将数据从行移动到列。

library(cdata)
# first build the control table
pivotControlTable <- buildPivotControlTableD(table = dat1, # reference to dataset
                        columnToTakeKeysFrom = 'numbers', # this will become column headers
                        columnToTakeValuesFrom = 'value', # this contains data
                        sep="_")                          # optional for making column names

# perform the move of data to columns
dat_wide <- moveValuesToColumnsD(tallTable =  dat1, # reference to dataset
                    keyColumns = c('name'),         # this(these) column(s) should stay untouched 
                    controlTable = pivotControlTable# control table above
                    ) 
dat_wide

#>         name  numbers_1  numbers_2  numbers_3  numbers_4
#> 1  firstName  0.3407997 -0.7033403 -0.3795377 -0.7460474
#> 2 secondName -0.8981073 -0.3347941 -0.5013782 -0.1745357

Answer 11

更简单的方法！

devtools::install_github("yikeshu0611/onetree") #install onetree package

library(onetree)
widedata=reshape_toWide(data = dat1,id = "name",j = "numbers",value.var.prefix = "value")
widedata

        name     value1     value2     value3     value4
   firstName  0.3407997 -0.7033403 -0.3795377 -0.7460474
  secondName -0.8981073 -0.3347941 -0.5013782 -0.1745357

如果要将go从宽变长，只需将Wide改成Long，对象没有变化。

reshape_toLong(data = widedata,id = "name",j = "numbers",value.var.prefix = "value")

        name numbers      value
   firstName       1  0.3407997
  secondName       1 -0.8981073
   firstName       2 -0.7033403
  secondName       2 -0.3347941
   firstName       3 -0.3795377
  secondName       3 -0.5013782
   firstName       4 -0.7460474
  secondName       4 -0.1745357

Answer 12

即使您缺少对并且不需要排序，这也有效（ as.matrix(dat1)[,1:2]可以替换为cbind(dat1[,1],dat1[,2]) ）：

> set.seed(45);dat1=data.frame(name=rep(c("firstName","secondName"),each=4),numbers=rep(1:4,2),value=rnorm(8))
> u1=unique(dat1[,1]);u2=unique(dat1[,2])
> m=matrix(nrow=length(u1),ncol=length(u2),dimnames=list(u1,u2))
> m[as.matrix(dat1)[,1:2]]=dat1[,3]
> m
                    1          2          3          4
firstName   0.3407997 -0.7033403 -0.3795377 -0.7460474
secondName -0.8981073 -0.3347941 -0.5013782 -0.1745357

如果您缺少对并且需要排序，这将不起作用，但如果对已经排序，它会更短一些：

> u1=unique(dat1[,1]);u2=unique(dat1[,2])
> dat1=dat1[order(dat1[,1],dat1[,2]),] # not actually needed in this case
> matrix(dat1[,3],length(u1),,T,list(u1,u2))
                    1          2          3          4
firstName   0.3407997 -0.7033403 -0.3795377 -0.7460474
secondName -0.8981073 -0.3347941 -0.5013782 -0.1745357

这是第一种方法的 function 版本（添加as.data.frame以使其与小标题一起使用）：

l2w=function(x,row=1,col=2,val=3,sort=F){
  u1=unique(x[,row])
  u2=unique(x[,col])
  if(sort){u1=sort(u1);u2=sort(u2)}
  out=matrix(nrow=length(u1),ncol=length(u2),dimnames=list(u1,u2))
  out[cbind(x[,row],x[,col])]=x[,val]
  out
}

或者，如果您只有下三角形的值，您可以这样做：

> euro=as.matrix(eurodist)[1:3,1:3]
> lower=data.frame(V1=rownames(euro)[row(euro)[lower.tri(euro)]],V2=colnames(euro)[col(euro)[lower.tri(euro)]],V3=euro[lower.tri(euro)])
> lower
         V1        V2   V3
1 Barcelona    Athens 3313
2  Brussels    Athens 2963
3  Brussels Barcelona 1318
> n=unique(c(lower[,1],lower[,2]))
> full=rbind(lower,setNames(lower[,c(2,1,3)],names(lower)),data.frame(V1=n,V2=n,V3=0))
> full
         V1        V2   V3
1 Barcelona    Athens 3313
2  Brussels    Athens 2963
3  Brussels Barcelona 1318
4    Athens Barcelona 3313
5    Athens  Brussels 2963
6 Barcelona  Brussels 1318
7    Athens    Athens    0
8 Barcelona Barcelona    0
9  Brussels  Brussels    0
> l2w(full,sort=T)
          Athens Barcelona Brussels
Athens         0      3313     2963
Barcelona   3313         0     1318
Brussels    2963      1318        0

或者这是另一种方法：

> rc=as.matrix(lower[-3])
> n=sort(unique(c(rc)))
> m=matrix(0,length(n),length(n),,list(n,n))
> m[rc]=lower[,3]
> m[rc[,2:1]]=lower[,3]
> m
          Athens Barcelona Brussels
Athens         0      3313     2963
Barcelona   3313         0     1318
Brussels    2963      1318        0

基础 R 中的另一个简单方法是使用xtabs 。 xtabs的结果基本上只是一个带有花哨的 class 名称的矩阵，但是您可以使它看起来像一个带有class(x)=NULL;attr(x,"call")=NULL;dimnames(x)=unname(dimnames(x))的常规矩阵class(x)=NULL;attr(x,"call")=NULL;dimnames(x)=unname(dimnames(x)) ：

> x=xtabs(value~name+numbers,dat1);x
            numbers
name                  1          2          3          4
  firstName   0.3407997 -0.7033403 -0.3795377 -0.7460474
  secondName -0.8981073 -0.3347941 -0.5013782 -0.1745357
> str(x)
 'xtabs' num [1:2, 1:4] 0.341 -0.898 -0.703 -0.335 -0.38 ...
 - attr(*, "dimnames")=List of 2
  ..$ name   : chr [1:2] "firstName" "secondName"
  ..$ numbers: chr [1:4] "1" "2" "3" "4"
 - attr(*, "call")= language xtabs(formula = value ~ name + numbers, data = dat1)
> class(x)
[1] "xtabs" "table"
> class(as.matrix(x)) # `as.matrix` has no effect because `x` is already a matrix
[1] "xtabs" "table"
> class(x)=NULL;class(x)
[1] "matrix" "array"
> attr(x,"call")=NULL;dimnames(x)=unname(dimnames(x))
> x # now it looks like a regular matrix
                    1          2          3          4
firstName   0.3407997 -0.7033403 -0.3795377 -0.7460474
secondName -0.8981073 -0.3347941 -0.5013782 -0.1745357
> str(x)
 num [1:2, 1:4] 0.341 -0.898 -0.703 -0.335 -0.38 ...
 - attr(*, "dimnames")=List of 2
  ..$ : chr [1:2] "firstName" "secondName"
  ..$ : chr [1:4] "1" "2" "3" "4"

通常as.data.frame(x)将xtabs的结果转换回长格式，但您可以使用class(x)=NULL避免它：

> x=xtabs(value~name+numbers,dat1);as.data.frame(x)
        name numbers       Freq
1  firstName       1  0.3407997
2 secondName       1 -0.8981073
3  firstName       2 -0.7033403
4 secondName       2 -0.3347941
5  firstName       3 -0.3795377
6 secondName       3 -0.5013782
7  firstName       4 -0.7460474
8 secondName       4 -0.1745357
> class(x)=NULL;as.data.frame(x)
                    1          2          3          4
firstName   0.3407997 -0.7033403 -0.3795377 -0.7460474
secondName -0.8981073 -0.3347941 -0.5013782 -0.1745357

这会将宽 fromat 的数据转换为长格式（ unlist将 dataframe 转换为向量， c将矩阵转换为向量）：

w2l=function(x)data.frame(V1=rownames(x)[row(x)],V2=colnames(x)[col(x)],V3=unname(c(unlist(x))))

Answer 13

通过链接问题来到这里Reshape three column data frame to matrix ("long" to "wide" format) 。 这个问题已经结束，所以我在这里写了一个替代解决方案。

我找到了一个替代解决方案，可能对寻找将三列转换为矩阵的人有用。 我指的是 decoupleR (2.3.2) 包。 以下是从他们的网站复制的

生成一种表格，其中行来自 id_cols，列来自 names_from，值来自 values_from。

用法

pivot_wider_profile(
data,
id_cols,
names_from,
values_from,
values_fill = NA,
to_matrix = FALSE,
to_sparse = FALSE,
...
)

Answer 14

仅使用dplyr和map 。

library(dplyr)
library(purrr)
set.seed(45)
dat1 <- data.frame(
  name = rep(c("firstName", "secondName"), each=4),
  numbers = rep(1:4, 2), value = rnorm(8)
)
longer_to_wider <- function(data, name_from, value_from){
  group <- colnames(data)[!(colnames(data) %in% c(name_from,value_from))]
  data %>% group_by(.data[[group]]) %>%
    summarise( name = list(.data[[name_from]]), 
               value = list(.data[[value_from]])) %>%
    {
      d <- data.frame(
        name = .[[name_from]] %>% unlist() %>% unique()
      )
      e <- map_dfc(.[[group]],function(x){
          y <- data_frame(
            x = data %>% filter(.data[[group]] == x) %>% pull(value_from)
          )
          colnames(y) <- x
          y
      })
      cbind(d,e)
    }
}
longer_to_wider(dat1, "name", "value")
#    name          1          2          3          4
# 1  firstName  0.3407997 -0.7033403 -0.3795377 -0.7460474
# 2 secondName -0.8981073 -0.3347941 -0.5013782 -0.1745357

如何将数据从长格式重塑为宽格式

问题描述

14 个解决方案

解决方案1
329 已采纳 2011-05-04 23:20:03

解决方案2
157 2014-07-29 19:37:09

解决方案3
83 2011-05-04 22:42:14

解决方案4
59 2016-03-27 22:35:51

解决方案5
40 2019-07-12 20:14:34

解决方案6
29 2011-05-04 22:58:48

解决方案7
24 2015-07-14 17:44:08

解决方案8
17 2016-09-02 07:52:19

解决方案9
14 2018-08-02 23:50:52

解决方案10
10 2017-12-23 23:01:37

解决方案11
3 2019-07-26 05:47:41

解决方案12
0 2022-08-10 15:05:36

解决方案13
0 2022-10-31 23:40:17

解决方案14
-1 2021-11-02 15:36:29

如何将数据从长格式重塑为宽格式

问题描述

14 个解决方案

解决方案1 329 已采纳 2011-05-04 23:20:03

解决方案2 157 2014-07-29 19:37:09

解决方案3 83 2011-05-04 22:42:14

解决方案4 59 2016-03-27 22:35:51

解决方案5 40 2019-07-12 20:14:34

解决方案6 29 2011-05-04 22:58:48

解决方案7 24 2015-07-14 17:44:08

解决方案8 17 2016-09-02 07:52:19

解决方案9 14 2018-08-02 23:50:52

解决方案10 10 2017-12-23 23:01:37

解决方案11 3 2019-07-26 05:47:41

解决方案12 0 2022-08-10 15:05:36

解决方案13 0 2022-10-31 23:40:17

解决方案14 -1 2021-11-02 15:36:29

解决方案1
329 已采纳 2011-05-04 23:20:03

解决方案2
157 2014-07-29 19:37:09

解决方案3
83 2011-05-04 22:42:14

解决方案4
59 2016-03-27 22:35:51

解决方案5
40 2019-07-12 20:14:34

解决方案6
29 2011-05-04 22:58:48

解决方案7
24 2015-07-14 17:44:08

解决方案8
17 2016-09-02 07:52:19

解决方案9
14 2018-08-02 23:50:52

解决方案10
10 2017-12-23 23:01:37

解决方案11
3 2019-07-26 05:47:41

解决方案12
0 2022-08-10 15:05:36

解决方案13
0 2022-10-31 23:40:17

解决方案14
-1 2021-11-02 15:36:29