簡體   English   中英

R 基於列值聚合數據框

[英]R Aggregate data frame based on column values

我有一個看起來像這樣的數據集:

> newex
        Name Volume Period
1        oil  29000 Jun 21
2       gold    800 Mar 22
3        oil  21000 Jul 21
4       gold   1100 Sep 21
5       gold   3000 Feb 21
6    depower      3  Q1 21
7        oil  23000 Apr 22
8    czpower     26  Q1 23
9        oil  17000  Q1 21
10      gold   2400 May 21
11       oil  12000  Q2 21
12      gold   1800 Jan 22
13   czpower     21 Oct 21
14  api2coal   6000  Q1 22
15  api2coal  11000  Q1 21
16   depower     11 Jan 22
17  api2coal  16000 Jul 21
18      gold   1300 Mar 21
19   depower      3  Q1 22
20       oil  17000 Cal 21

我想重塑數據集以獲得具有以下特征的數據框:

  • Name中的值將成為新的變量(列);
  • Period中的值將成為索引(應該是唯一的);
  • Volume中的值是NamePeriod的每個組合的值的總和。

有人可以給我一個關於如何實現這一目標的提示嗎? 先感謝您。

我們可以使用pivot_wider

library(dplyr)
library(tidyr)

newex %>% 
  pivot_wider(
    names_from = Name,
    values_from = Volume
  )
   Period   oil  gold depower czpower api2coal
   <chr>  <int> <int>   <int>   <int>    <int>
 1 Jun 21 29000    NA      NA      NA       NA
 2 Mar 22    NA   800      NA      NA       NA
 3 Jul 21 21000    NA      NA      NA    16000
 4 Sep 21    NA  1100      NA      NA       NA
 5 Feb 21    NA  3000      NA      NA       NA
 6 Q1  21 17000    NA       3      NA    11000
 7 Apr 22 23000    NA      NA      NA       NA
 8 Q1  23    NA    NA      NA      26       NA
 9 May 21    NA  2400      NA      NA       NA
10 Q2  21 12000    NA      NA      NA       NA
11 Jan 22    NA  1800      11      NA       NA
12 Oct 21    NA    NA      NA      21       NA
13 Q1  22    NA    NA       3      NA     6000
14 Mar 21    NA  1300      NA      NA       NA
15 Cal 21 17000    NA      NA      NA       NA

數據:

newex <- structure(list(Name = c("oil", "gold", "oil", "gold", "gold", 
"depower", "oil", "czpower", "oil", "gold", "oil", "gold", "czpower", 
"api2coal", "api2coal", "depower", "api2coal", "gold", "depower", 
"oil"), Volume = c(29000L, 800L, 21000L, 1100L, 3000L, 3L, 23000L, 
26L, 17000L, 2400L, 12000L, 1800L, 21L, 6000L, 11000L, 11L, 16000L, 
1300L, 3L, 17000L), Period = c("Jun 21", "Mar 22", "Jul 21", 
"Sep 21", "Feb 21", "Q1  21", "Apr 22", "Q1  23", "Q1  21", "May 21", 
"Q2  21", "Jan 22", "Oct 21", "Q1  22", "Q1  21", "Jan 22", "Jul 21", 
"Mar 21", "Q1  22", "Cal 21")), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", 
"14", "15", "16", "17", "18", "19", "20"))
library(data.table)
dcast(setDT(newex),Period~Name, value.var="Volume",fun.aggregate = sum)

Output:

    Period api2coal czpower depower gold   oil
 1: Apr 22        0       0       0    0 23000
 2: Cal 21        0       0       0    0 17000
 3: Feb 21        0       0       0 3000     0
 4: Jan 22        0       0      11 1800     0
 5: Jul 21    16000       0       0    0 21000
 6: Jun 21        0       0       0    0 29000
 7: Mar 21        0       0       0 1300     0
 8: Mar 22        0       0       0  800     0
 9: May 21        0       0       0 2400     0
10: Oct 21        0      21       0    0     0
11: Q1  21    11000       0       3    0 17000
12: Q1  22     6000       0       3    0     0
13: Q1  23        0      26       0    0     0
14: Q2  21        0       0       0    0 12000
15: Sep 21        0       0       0 1100     0

我在嘗試之前的答案后找到了答案:

newex %>% 
 pivot_wider(
     names_from = Name,
     values_from = Volume, values_fn = sum)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM