將列值匯總並聚合為R中的行

Question

我的數據框主要包含catagorical列和一個數字列，df看起來像這樣（簡化）：

**Home_type**     **Garden_type**       **NaighbourhoOd**    **Rent** 
Vila                big                  brooklyn             5000
Vila                small                bronx                7000
Condo               shared               Sillicon valley      2000 
Appartment          none                 brooklyn             500
Condo               none                 bronx                1700
Appartment          none                 Sillicon Valley      800

對於每個catagorical列，我想顯示與其相關的所有不同值，頻率和租金總和。

結果應如下所示：

**Variable**     **Distinct_values**      **No_of-Occurences**     **SUM_RENT**
  Home_type        Vila                     2                        12000
  Home_type        Condo                    2                        3700
  Home_type        Appartment               2                        1300
  Garden_type      big                      1                        5000
  Garden_type      small                    1                        7000
  Garden_type      shared                   1                        2000 
  Garden_type      none                     3                        3000 
  Naighbourhood    brooklyn                 2                        5500
  Naighbourhood    Bronx                    2                        8700 
  Naighbourhood    Sillicon Valley          2                        2800

我是R的新手，並試圖在reshape2中使用融合做到這一點，但沒有取得多大成功，任何幫助將不勝感激。

Answer 1

我傾向於tidyr喜歡tidyr reshape2 ，盡管這主要是因為語法更類似於dplyr - 由於加載magrittr管道（ %>% ）及其數據匯總工具，這將使這項任務更加容易。

首先，我們將所有非租用列（從tidyr ） gather到長形式（僅運行這兩行以查看結果）。 然后group_by您想要聚集在一起的列。 最后，在每個組中summarise以獲得所需的指標。

df %>%
  gather(Variable, Distinct_Values, -Rent) %>%
  group_by(Variable, Distinct_Values) %>%
  summarise(
    `No_of-Occurences` = n()
    , SUM_RENT = sum(Rent)
  )

得到：

        Variable Distinct_Values `No_of-Occurences` SUM_RENT
           <chr>           <chr>              <int>    <int>
1    Garden_type             big                  1     5000
2    Garden_type            none                  3     3000
3    Garden_type          shared                  1     2000
4    Garden_type           small                  1     7000
5      Home_type      Appartment                  2     1300
6      Home_type           Condo                  2     3700
7      Home_type            Vila                  2    12000
8  NaighbourhoOd           bronx                  2     8700
9  NaighbourhoOd        brooklyn                  2     5500
10 NaighbourhoOd Sillicon valley                  1     2000
11 NaighbourhoOd Sillicon Valley                  1      800

（注意，你的數據有“V”和“v”代表“硅谷”導致兩條不同的行。）

Answer 2

我們可以使用data.table 。 將'data.frame'轉換為'data.table'（ setDT(df1) ），從'wide' melt為'long'格式，按'變量'，'值'（從melt創建的列）分組，我們創建兩列'No_of_occur'，'SUM_RENT'作為行數（ .N ）和'Rent'列的sum ，然后按'變量'，'No_of_occur'和'SUM_RENT'分組，得到'value'的unique元素列（'Distinct_values'）

library(data.table)
melt(setDT(df1), id.var=c('Rent'))[, c("No_of_occur", "SUM_RENT") :=
      .(.N, sum(Rent)) ,.(variable, value)][,
    .(Distinct_values = unique(value)) , .(variable, No_of_occur, SUM_RENT)]
 #         variable No_of_occur SUM_RENT Distinct_values
 #1:     Home_type           2    12000            Vila
 #2:     Home_type           2     3700           Condo
 #3:     Home_type           2     1300      Appartment
 #4:   Garden_type           1     5000             big
 #5:   Garden_type           1     7000           small
 #6:   Garden_type           1     2000          shared
 #7:   Garden_type           3     3000            none
 #8: NaighbourhoOd           2     5500        brooklyn
 #9: NaighbourhoOd           2     8700           bronx
 #10:NaighbourhoOd           2     2800 Sillicon Valley

將列值匯總並聚合為R中的行

問題描述

2 個解決方案

解決方案1
2 已采納 2016-11-04 12:36:29

解決方案2
1 2016-11-04 12:38:04

將列值匯總並聚合為R中的行

問題描述

2 個解決方案

解決方案1 2 已采納 2016-11-04 12:36:29

解決方案2 1 2016-11-04 12:38:04

解決方案1
2 已采納 2016-11-04 12:36:29

解決方案2
1 2016-11-04 12:38:04