[英]Create frequency tables for multiple factor columns in R
我是R的新手。我正在编写一本关于我工作的常用功能/特性的语法的单独手册。 我的示例数据框如下:
x.sample <-
structure(list(Q9_A = structure(c(5L, 3L, 5L, 3L, 5L, 3L, 1L,
5L, 5L, 5L), .Label = c("Impt", "Neutral", "Not Impt at all",
"Somewhat Impt", "Very Impt"), class = "factor"), Q9_B = structure(c(5L,
5L, 5L, 3L, 5L, 5L, 3L, 5L, 3L, 3L), .Label = c("Impt", "Neutral",
"Not Impt at all", "Somewhat Impt", "Very Impt"), class = "factor"),
Q9_C = structure(c(3L, 5L, 5L, 3L, 5L, 5L, 3L, 5L, 5L, 3L
), .Label = c("Impt", "Neutral", "Not Impt at all", "Somewhat Impt",
"Very Impt"), class = "factor")), .Names = c("Q9_A", "Q9_B",
"Q9_C"), row.names = c(NA, 10L), class = "data.frame")
> x.sample
Q9_A Q9_B Q9_C
1 Very Impt Very Impt Not Impt at all
2 Not Impt at all Very Impt Very Impt
3 Very Impt Very Impt Very Impt
4 Not Impt at all Not Impt at all Not Impt at all
5 Very Impt Very Impt Very Impt
6 Not Impt at all Very Impt Very Impt
7 Impt Not Impt at all Not Impt at all
8 Very Impt Very Impt Very Impt
9 Very Impt Not Impt at all Very Impt
10 Very Impt Not Impt at all Not Impt at all
我的原始数据框有21列。
如果我想找到平均值(将其视为序数变量):
> sapply(x.sample,function(x) mean(as.numeric(x), na.rm=TRUE))
Q9_A Q9_B Q9_C
4.0 4.2 4.2
我想为我的数据帧中的所有变量制作频率表。 我搜索了互联网和许多论坛,并看到最近的命令是使用sapply。 但是当我这样做时,它给了所有0。
> sapply(x.sample,function(x) table(factor(x.sample, levels=c("Not Impt at all", "Somewhat Impt", "Neutral", "Impt", "Very Impt"), ordered=TRUE)))
Q9_A Q9_B Q9_C
Not Impt at all 0 0 0
Somewhat Impt 0 0 0
Neutral 0 0 0
Impt 0 0 0
Very Impt 0 0 0
问题如何根据上表对数据帧中的所有列(即因子)制作一个频率表来制作频率表?
PS很抱歉,如果这似乎是琐事,但我搜索了2天没有答案,并尝试所有可能的组合。 也许我没有足够的搜索=(
非常感谢。
你快到了。 只需对你的功能进行一次小改动就能让你在那里。 在x
的function(x) ...
需要通过传递给table()
调用:
levs <- c("Not Impt at all", "Somewhat Impt", "Neutral", "Impt", "Very Impt")
sapply(x.sample, function(x) table(factor(x, levels=levs, ordered=TRUE)))
稍微重复一下代码可能会让它更易于阅读:
sapply(lapply(x.sample,factor,levels=levs,ordered=TRUE), table)
# Q9_A Q9_B Q9_C
#Not Impt at all 3 4 4
#Somewhat Impt 0 0 0
#Neutral 0 0 0
#Impt 1 0 0
#Very Impt 6 6 6
来得有点晚了,但这是一个reshape2
可能解决方案。 recast
可能非常简单,但是我们需要在这里处理空因子水平,所以我们需要指定两个factorsAsStrings = FALSE
在melt
factorsAsStrings = FALSE
和dcast
drop = FALSE
,而recast
不能将参数传递给melt
(仅限于dcast
),所以这里
library(reshape2)
x.sample$indx <- 1
dcast(melt(x.sample, "indx", factorsAsStrings = FALSE), value ~ variable, drop = FALSE)
# value Q9_A Q9_B Q9_C
# 1 Impt 1 0 0
# 2 Neutral 0 0 0
# 3 Not Impt at all 3 4 4
# 4 Somewhat Impt 0 0 0
# 5 Very Impt 6 6 6
如果我们不关心空白水平,那么快速解决方案就是这样
recast(x.sample, value ~ variable, id.var = "indx")
# value Q9_A Q9_B Q9_C
# 1 Impt 1 0 0
# 2 Not Impt at all 3 4 4
# 3 Very Impt 6 6 6
或者,如果速度是一个问题,我们可以使用data.atble
做同样的data.atble
library(data.table)
dcast(melt(setDT(x.sample), measure.vars = names(x.sample), value.factor = TRUE),
value ~ variable, drop = FALSE)
# value Q9_A Q9_B Q9_C
# 1: Impt 1 0 0
# 2: Neutral 0 0 0
# 3: Not Impt at all 3 4 4
# 4: Somewhat Impt 0 0 0
# 5: Very Impt 6 6 6
为什么不呢:
> sapply(x.sample, table)
Q9_A Q9_B Q9_C
Impt 1 0 0
Neutral 0 0 0
Not Impt at all 3 4 4
Somewhat Impt 0 0 0
Very Impt 6 6 6
我们称之为'tbl';
tbl[ order(match(rownames(tbl), c("Not Impt at all", "Somewhat Impt",
"Neutral", "Impt", "Very Impt")) ) , ]
Q9_A Q9_B Q9_C
Not Impt at all 3 4 4
Somewhat Impt 0 0 0
Neutral 0 0 0
Impt 1 0 0
Very Impt 6 6 6
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.