[英]Frequency count of two column in R
I have two columns in data frame我在数据框中有两列
2010 1
2010 1
2010 2
2010 2
2010 3
2011 1
2011 2
I want to count frequency of both columns and get the result in this format我想计算两列的频率并以这种格式获得结果
y m Freq
2010 1 2
2010 2 2
2010 3 1
2011 1 1
2011 2 1
If your data is dataframe df
with columns y
and m
如果您的数据是带有
y
和m
列的数据框df
library(plyr)
counts <- ddply(df, .(df$y, df$m), nrow)
names(counts) <- c("y", "m", "Freq")
I haven't seen a dplyr answer yet.我还没有看到dplyr 的答案。 The code is rather simple.
代码比较简单。
library(dplyr)
rename(count(df, y, m), Freq = n)
# Source: local data frame [5 x 3]
# Groups: V1 [?]
#
# y m Freq
# (int) (int) (int)
# 1 2010 1 2
# 2 2010 2 2
# 3 2010 3 1
# 4 2011 1 1
# 5 2011 2 1
Data:数据:
df <- structure(list(y = c(2010L, 2010L, 2010L, 2010L, 2010L, 2011L,
2011L), m = c(1L, 1L, 2L, 2L, 3L, 1L, 2L)), .Names = c("y", "m"
), class = "data.frame", row.names = c(NA, -7L))
A more idiomatic data.table version of @ugh's answer would be: @ugh 答案的更惯用的 data.table 版本是:
library(data.table) # load package
df <- data.frame(y = c(rep(2010, 5), rep(2011,2)), m = c(1,1,2,2,3,1,2)) # setup data
dt <- data.table(df) # transpose to data.table
dt[, list(Freq =.N), by=list(y,m)] # use list to name var directly
Using sqldf
:使用
sqldf
:
sqldf("SELECT y, m, COUNT(*) as Freq
FROM table1
GROUP BY y, m")
If you had a very big data frame with many columns or didn't know the column names in advance, something like this might be useful:如果您有一个包含许多列的非常大的数据框,或者事先不知道列名,这样的操作可能会很有用:
library(reshape2)
df_counts <- melt(table(df))
names(df_counts) <- names(df)
colnames(df_counts)[ncol(df_counts)] <- "count"
df_counts
y m count
1 2010 1 2
2 2011 1 1
3 2010 2 2
4 2011 2 1
5 2010 3 1
6 2011 3 0
Here is a simple base R
solution using table()
and as.data.frame()
这是一个使用
table()
和as.data.frame()
的简单基本R
解决方案
df2 <- as.data.frame(table(df1))
# df2
y m Freq
1 2010 1 2
2 2011 1 1
3 2010 2 2
4 2011 2 1
5 2010 3 1
6 2011 3 0
df2[df2$Freq != 0, ]
# output
y m Freq
1 2010 1 2
2 2011 1 1
3 2010 2 2
4 2011 2 1
5 2010 3 1
Data数据
df1 <- structure(list(y = c(2010L, 2010L, 2010L, 2010L, 2010L, 2011L,
2011L), m = c(1L, 1L, 2L, 2L, 3L, 1L, 2L)), .Names = c("y", "m"
), class = "data.frame", row.names = c(NA, -7L))
library(data.table)
oldformat <- data.table(oldformat) ## your orignal data frame
newformat <- oldformat[,list(Freq=length(m)), by=list(y,m)]
Here another approach that I found here :这是我在这里找到的另一种方法:
df<- structure(list(y = c(2010L, 2010L, 2010L, 2010L, 2010L, 2011L,
2011L), m = c(1L, 1L, 2L, 2L, 3L, 1L, 2L)), .Names = c("y", "m"
), class = "data.frame", row.names = c(NA, -7L))
Two options:两种选择:
aggregate(cbind(count = y) ~ m,
data = df,
FUN = function(x){NROW(x)})
or或者
aggregate(cbind(count = y) ~ m,
data = df,
FUN = length)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.