簡體   English   中英

R: ddply - 通過將字符串分配為變量名來聚合數據

[英]R: ddply - Aggregate data by assigning string as variable name

我得到了一個包含幾列的大數據集。 舉個例子

set.seed(1)
x <- 1:15
y <- letters[1:3][sample(1:3, 15, replace = T)]
z <- letters[10:13][sample(1:3, 15, replace = T)]
r <- letters[20:24][sample(1:3, 15, replace = T)]
df <- data.frame("Number"=x, "Section"=y,"Chapter"=z,"Rating"=r)
dput(df)

structure(list(Number = 1:15, Area = structure(c(1L, 2L, 2L, 3L, 1L, 3L, 3L, 2L, 2L, 1L, 1L, 1L, 3L, 2L, 3L), .Label = c("a", "b", "c"), class = "factor"), Section = structure(c(2L, 3L, 3L, 2L, 3L, 3L, 1L, 2L, 1L, 1L, 2L, 1L, 2L, 3L, 2L), .Label = c("j", "k", "l"), class = "factor"), Rating = structure(c(2L, 2L, 2L, 1L, 3L, 3L, 3L, 1L, 3L, 2L, 3L, 2L, 3L, 2L, 2L), .Label = c("A", "B", "C"), class = "factor")), class = "data.frame", row.names = c(NA,-15L))

我現在想創建按評級和選定類別划分的頻率表和圖表,例如通過字符串:

Category<-"Section"
data_count <- ddply(df, .(get(Category),Rating), 'count')
data_rel_freq <- ddply(data_count, .(Rating), transform, rel_freq = freq/sum(freq))
dput(data_rel_freq)

structure(list(get.Category. = structure(c(2L, 2L, 1L, 1L, 2L, 2L, 3L, 3L, 3L, 1L, 1L, 2L, 2L, 3L, 3L), .Label = c("j", "k","l"), class = "factor"), Number = c(4L, 8L, 10L, 12L, 1L, 15L, 2L, 3L, 14L, 7L, 9L, 11L, 13L, 5L, 6L), Area = structure(c(3L, 2L, 1L, 1L, 1L, 3L, 2L, 2L, 2L, 3L, 2L, 1L, 3L, 1L, 3L), .Label = c("a", b", "c"), class = "factor"), Section = structure(c(2L, 2L, 1L, 1L, 2L, 2L, 3L, 3L, 3L, 1L, 1L, 2L, 2L, 3L, 3L), .Label = c("j", "k", "l"), class = "factor"), Rating = structure(c(1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("A", "B", "C"), class = "factor"), freq = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), rel_freq = c(0.5, 0.5, 0.142857142857143, 0.142857142857143, 0.142857142857143, 0.142857142857143, 0.142857142857143, 0.142857142857143, 0.142857142857143, 0.166666666666667, 0.166666666666667, 0.166666666666667, 0.166666666666667, 0.166666666666667, 0.166666666666667)), class = "data.frame", row.names = c(NA, -15L))

使用 ggplot

ggplot(data_rel_freq,aes(x = Rating, y = rel_freq,fill = get(Category)))+ 
geom_bar(position = "fill",stat = "identity",color="black") +
scale_y_continuous(labels = percent_format())+ 
labs(x = "Rating", y="Relative Frequency")

在此處輸入圖片說明

現在的問題是“get(Category)”現在被視為一個新列

    get.Category. Number Area Section Rating freq  rel_freq
1              k      4    c       k      A    1 0.5000000
2              k      8    b       k      A    1 0.5000000
3              j     10    a       j      B    1 0.1428571
4              j     12    a       j      B    1 0.1428571
5              k      1    a       k      B    1 0.1428571
6              k     15    c       k      B    1 0.1428571
7              l      2    b       l      B    1 0.1428571

此外,數字列應該相加,例如其他類別(這里:面積)應該被刪除,我們應該只有一行,用於“k”部分,評級為“A”。

我們可以使用count通過在轉換為符號 ( sym ) 並評估 ( !! ) 后評估對象標識符 'Category' 來獲取列 'Section' 的頻率。 ggplot語法中, aes也可以采用一個符號,並且可以像之前一樣進行評估

library(tidyverse)
library(scales)
library(ggplot2)
df %>% 
    count(!! rlang::sym(Category), Rating) %>%
    group_by(Rating) %>% 
    mutate(rel_freq = n/sum(n)) %>%
    ggplot(., aes(x =Rating, y = rel_freq, fill = !! rlang::sym(Category))) + 
    geom_bar(position = "fill",stat = "identity",color="black") + 
    scale_y_continuous(labels = percent_format())+ 
    labs(x = "Rating", y="Relative Frequency")

-輸出

在此處輸入圖片說明

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM