R中有幾個變量的頻率表

Question

我試圖復制官方統計中經常使用的表，但到目前為止沒有成功。 給定像這樣的數據幀：

d1 <- data.frame( StudentID = c("x1", "x10", "x2", 
                          "x3", "x4", "x5", "x6", "x7", "x8", "x9"),
             StudentGender = c('F', 'M', 'F', 'M', 'F', 'M', 'F', 'M', 'M', 'M'),
             ExamenYear    = c('2007','2007','2007','2008','2008','2008','2008','2009','2009','2009'),
             Exam          = c('algebra', 'stats', 'bio', 'algebra', 'algebra', 'stats', 'stats', 'algebra', 'bio', 'bio'),
             participated  = c('no','yes','yes','yes','no','yes','yes','yes','yes','yes'),  
             passed      = c('no','yes','yes','yes','no','yes','yes','yes','no','yes'),
             stringsAsFactors = FALSE)

我想創建一個表格，顯示每年，所有學生（所有）和女性，參與者和通過的人數。 請注意下面的“ofwhich”指的是所有學生。

我想到的一張桌子看起來像這樣：

cbind(All = table(d1$ExamenYear),
  participated      = table(d1$ExamenYear, d1$participated)[,2],
  ofwhichFemale     = table(d1$ExamenYear, d1$StudentGender)[,1],
  ofwhichpassed     = table(d1$ExamenYear, d1$passed)[,2])

我相信在R.這種事情有更好的方法。

注意：我已經看過LaTex解決方案，但我沒有使用這對我有用，因為我需要在Excel中導出表。

提前致謝

Answer 1

使用plyr ：

require(plyr)
ddply(d1, .(ExamenYear), summarize,
      All=length(ExamenYear),
      participated=sum(participated=="yes"),
      ofwhichFemale=sum(StudentGender=="F"),
      ofWhichPassed=sum(passed=="yes"))

這使：

  ExamenYear All participated ofwhichFemale ofWhichPassed
1       2007   3            2             2             2
2       2008   4            3             2             3
3       2009   3            3             0             2

Answer 2

plyr包非常適合這類事情。 首先加載包

library(plyr)

然后我們使用ddply函數：

ddply(d1, "ExamenYear", summarise, 
      All = length(passed),##We can use any column for this statistics
      participated = sum(participated=="yes"),
      ofwhichFemale = sum(StudentGender=="F"),
      ofwhichpassed = sum(passed=="yes"))

基本上，ddply期望數據幀作為輸入並返回數據幀。 然后我們通過ExamenYear拆分輸入數據框。 在每個子表上，我們計算一些匯總統計信息。 請注意，在ddply中，我們在引用列時不必使用$表示法。

Answer 3

有可能是一對夫婦的修改（使用with減少的數量df$調用和使用字符索引來提高自身的文件）到你的代碼，將使它更容易閱讀和有價值的競爭對手的ddply的解決方案：

with( d1, cbind(All = table(ExamenYear),
  participated      = table(ExamenYear, participated)[,"yes"],
  ofwhichFemale     = table(ExamenYear, StudentGender)[,"F"],
  ofwhichpassed     = table(ExamenYear, passed)[,"yes"])
     )

     All participated ofwhichFemale ofwhichpassed
2007   3            2             2             2
2008   4            3             2             3
2009   3            3             0             2

我希望這比ddply解決方案快得多，盡管只有在處理更大的數據集時才會顯而易見。

Answer 4

您可能還想看一下plyr的下一個迭代器： dplyr

它使用類似ggplot的語法，並通過在C ++中編寫關鍵部分來提供快速性能。

d1 %.% 
group_by(ExamenYear) %.%    
summarise(ALL=length(ExamenYear),
          participated=sum(participated=="yes"),
          ofwhichFemale=sum(StudentGender=="F"),
          ofWhichPassed=sum(passed=="yes"))

R中有幾個變量的頻率表

問題描述

4 個解決方案

解決方案1
9 已采納 2012-08-07 19:13:18

解決方案2
4 2012-08-07 19:14:21

解決方案3
4 2012-08-07 19:28:11

解決方案4
1 2014-01-26 07:24:42

R中有幾個變量的頻率表

問題描述

4 個解決方案

解決方案1 9 已采納 2012-08-07 19:13:18

解決方案2 4 2012-08-07 19:14:21

解決方案3 4 2012-08-07 19:28:11

解決方案4 1 2014-01-26 07:24:42

解決方案1
9 已采納 2012-08-07 19:13:18

解決方案2
4 2012-08-07 19:14:21

解決方案3
4 2012-08-07 19:28:11

解決方案4
1 2014-01-26 07:24:42