[英]summarising and aggregating column values as rows in R
我的數據框主要包含catagorical列和一個數字列,df看起來像這樣(簡化):
**Home_type** **Garden_type** **NaighbourhoOd** **Rent**
Vila big brooklyn 5000
Vila small bronx 7000
Condo shared Sillicon valley 2000
Appartment none brooklyn 500
Condo none bronx 1700
Appartment none Sillicon Valley 800
對於每個catagorical列,我想顯示與其相關的所有不同值,頻率和租金總和。
結果應如下所示:
**Variable** **Distinct_values** **No_of-Occurences** **SUM_RENT**
Home_type Vila 2 12000
Home_type Condo 2 3700
Home_type Appartment 2 1300
Garden_type big 1 5000
Garden_type small 1 7000
Garden_type shared 1 2000
Garden_type none 3 3000
Naighbourhood brooklyn 2 5500
Naighbourhood Bronx 2 8700
Naighbourhood Sillicon Valley 2 2800
我是R的新手,並試圖在reshape2中使用融合做到這一點,但沒有取得多大成功,任何幫助將不勝感激。
我傾向於tidyr
喜歡tidyr
reshape2
,盡管這主要是因為語法更類似於dplyr
- 由於加載magrittr
管道( %>%
)及其數據匯總工具,這將使這項任務更加容易。
首先,我們將所有非租用列(從tidyr
) gather
到長形式(僅運行這兩行以查看結果)。 然后group_by
您想要聚集在一起的列。 最后,在每個組中summarise
以獲得所需的指標。
df %>%
gather(Variable, Distinct_Values, -Rent) %>%
group_by(Variable, Distinct_Values) %>%
summarise(
`No_of-Occurences` = n()
, SUM_RENT = sum(Rent)
)
得到:
Variable Distinct_Values `No_of-Occurences` SUM_RENT
<chr> <chr> <int> <int>
1 Garden_type big 1 5000
2 Garden_type none 3 3000
3 Garden_type shared 1 2000
4 Garden_type small 1 7000
5 Home_type Appartment 2 1300
6 Home_type Condo 2 3700
7 Home_type Vila 2 12000
8 NaighbourhoOd bronx 2 8700
9 NaighbourhoOd brooklyn 2 5500
10 NaighbourhoOd Sillicon valley 1 2000
11 NaighbourhoOd Sillicon Valley 1 800
(注意,你的數據有“V”和“v”代表“硅谷”導致兩條不同的行。)
我們可以使用data.table
。 將'data.frame'轉換為'data.table'( setDT(df1)
),從'wide' melt
為'long'格式,按'變量','值'(從melt
創建的列)分組,我們創建兩列'No_of_occur','SUM_RENT'作為行數( .N
)和'Rent'列的sum
,然后按'變量','No_of_occur'和'SUM_RENT'分組,得到'value'的unique
元素列('Distinct_values')
library(data.table)
melt(setDT(df1), id.var=c('Rent'))[, c("No_of_occur", "SUM_RENT") :=
.(.N, sum(Rent)) ,.(variable, value)][,
.(Distinct_values = unique(value)) , .(variable, No_of_occur, SUM_RENT)]
# variable No_of_occur SUM_RENT Distinct_values
#1: Home_type 2 12000 Vila
#2: Home_type 2 3700 Condo
#3: Home_type 2 1300 Appartment
#4: Garden_type 1 5000 big
#5: Garden_type 1 7000 small
#6: Garden_type 1 2000 shared
#7: Garden_type 3 3000 none
#8: NaighbourhoOd 2 5500 brooklyn
#9: NaighbourhoOd 2 8700 bronx
#10:NaighbourhoOd 2 2800 Sillicon Valley
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.