[英]Create frequency tables for multiple factor columns in R
我是R的新手。我正在編寫一本關於我工作的常用功能/特性的語法的單獨手冊。 我的示例數據框如下:
x.sample <-
structure(list(Q9_A = structure(c(5L, 3L, 5L, 3L, 5L, 3L, 1L,
5L, 5L, 5L), .Label = c("Impt", "Neutral", "Not Impt at all",
"Somewhat Impt", "Very Impt"), class = "factor"), Q9_B = structure(c(5L,
5L, 5L, 3L, 5L, 5L, 3L, 5L, 3L, 3L), .Label = c("Impt", "Neutral",
"Not Impt at all", "Somewhat Impt", "Very Impt"), class = "factor"),
Q9_C = structure(c(3L, 5L, 5L, 3L, 5L, 5L, 3L, 5L, 5L, 3L
), .Label = c("Impt", "Neutral", "Not Impt at all", "Somewhat Impt",
"Very Impt"), class = "factor")), .Names = c("Q9_A", "Q9_B",
"Q9_C"), row.names = c(NA, 10L), class = "data.frame")
> x.sample
Q9_A Q9_B Q9_C
1 Very Impt Very Impt Not Impt at all
2 Not Impt at all Very Impt Very Impt
3 Very Impt Very Impt Very Impt
4 Not Impt at all Not Impt at all Not Impt at all
5 Very Impt Very Impt Very Impt
6 Not Impt at all Very Impt Very Impt
7 Impt Not Impt at all Not Impt at all
8 Very Impt Very Impt Very Impt
9 Very Impt Not Impt at all Very Impt
10 Very Impt Not Impt at all Not Impt at all
我的原始數據框有21列。
如果我想找到平均值(將其視為序數變量):
> sapply(x.sample,function(x) mean(as.numeric(x), na.rm=TRUE))
Q9_A Q9_B Q9_C
4.0 4.2 4.2
我想為我的數據幀中的所有變量制作頻率表。 我搜索了互聯網和許多論壇,並看到最近的命令是使用sapply。 但是當我這樣做時,它給了所有0。
> sapply(x.sample,function(x) table(factor(x.sample, levels=c("Not Impt at all", "Somewhat Impt", "Neutral", "Impt", "Very Impt"), ordered=TRUE)))
Q9_A Q9_B Q9_C
Not Impt at all 0 0 0
Somewhat Impt 0 0 0
Neutral 0 0 0
Impt 0 0 0
Very Impt 0 0 0
問題如何根據上表對數據幀中的所有列(即因子)制作一個頻率表來制作頻率表?
PS很抱歉,如果這似乎是瑣事,但我搜索了2天沒有答案,並嘗試所有可能的組合。 也許我沒有足夠的搜索=(
非常感謝。
你快到了。 只需對你的功能進行一次小改動就能讓你在那里。 在x
的function(x) ...
需要通過傳遞給table()
調用:
levs <- c("Not Impt at all", "Somewhat Impt", "Neutral", "Impt", "Very Impt")
sapply(x.sample, function(x) table(factor(x, levels=levs, ordered=TRUE)))
稍微重復一下代碼可能會讓它更易於閱讀:
sapply(lapply(x.sample,factor,levels=levs,ordered=TRUE), table)
# Q9_A Q9_B Q9_C
#Not Impt at all 3 4 4
#Somewhat Impt 0 0 0
#Neutral 0 0 0
#Impt 1 0 0
#Very Impt 6 6 6
來得有點晚了,但這是一個reshape2
可能解決方案。 recast
可能非常簡單,但是我們需要在這里處理空因子水平,所以我們需要指定兩個factorsAsStrings = FALSE
在melt
factorsAsStrings = FALSE
和dcast
drop = FALSE
,而recast
不能將參數傳遞給melt
(僅限於dcast
),所以這里
library(reshape2)
x.sample$indx <- 1
dcast(melt(x.sample, "indx", factorsAsStrings = FALSE), value ~ variable, drop = FALSE)
# value Q9_A Q9_B Q9_C
# 1 Impt 1 0 0
# 2 Neutral 0 0 0
# 3 Not Impt at all 3 4 4
# 4 Somewhat Impt 0 0 0
# 5 Very Impt 6 6 6
如果我們不關心空白水平,那么快速解決方案就是這樣
recast(x.sample, value ~ variable, id.var = "indx")
# value Q9_A Q9_B Q9_C
# 1 Impt 1 0 0
# 2 Not Impt at all 3 4 4
# 3 Very Impt 6 6 6
或者,如果速度是一個問題,我們可以使用data.atble
做同樣的data.atble
library(data.table)
dcast(melt(setDT(x.sample), measure.vars = names(x.sample), value.factor = TRUE),
value ~ variable, drop = FALSE)
# value Q9_A Q9_B Q9_C
# 1: Impt 1 0 0
# 2: Neutral 0 0 0
# 3: Not Impt at all 3 4 4
# 4: Somewhat Impt 0 0 0
# 5: Very Impt 6 6 6
為什么不呢:
> sapply(x.sample, table)
Q9_A Q9_B Q9_C
Impt 1 0 0
Neutral 0 0 0
Not Impt at all 3 4 4
Somewhat Impt 0 0 0
Very Impt 6 6 6
我們稱之為'tbl';
tbl[ order(match(rownames(tbl), c("Not Impt at all", "Somewhat Impt",
"Neutral", "Impt", "Very Impt")) ) , ]
Q9_A Q9_B Q9_C
Not Impt at all 3 4 4
Somewhat Impt 0 0 0
Neutral 0 0 0
Impt 1 0 0
Very Impt 6 6 6
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.