[英]How to get frequency counts on two variables in R?
I looking for a way to get a frequency count out of an R data frame based on two values. 我正在寻找一种基于两个值从R数据帧中获取频率计数的方法。 I've tried a few different syntaxes and I'm fairly new at R.
我尝试了几种不同的语法,而我在R上还是一个新手。
> table(frequency.data.frame$value,frequency.data.frame$value_x)[!is.na(frequency.data.frame$id),]
Error in `[.default`(table(frequency.data.frame$value, frequency.data.frame$value_x), :
(subscript) logical subscript too long
> table(frequency.data.frame$value,frequency.data.frame$value_x[!is.na(frequency.data.frame$id),])
Error in frequency.data.frame$value_x[!is.na(frequency.data.frame$id), :
incorrect number of dimensions
Given 给定
First dimension. 第一维。
as.data.frame(table(frequency.data.frame[!is.na(frequency.data.frame$id),]$value))
Var1 Freq
1 2 2
2 3 2
3 4 5
4 5 21
5 6 8
6 7 19
7 8 52
8 9 33
9 10 56
10 11 1
11 12 1
second dimension. 二维。
as.data.frame(table(frequency.data.frame[!is.na(frequency.data.frame$id),]$value_x))
Var1 Freq
1 1 50
2 2 17
3 3 12
4 4 7
5 6 18
6 8 6
7 9 1
8 10 19
9 14 1
10 15 1
11 16 11
12 17 2
13 18 2
14 96 3
15 97 4
16 98 46
Data frame sample data extract... 数据框样本数据提取...
> frequency.data.frame
id name factor value value_x
1 <NA> OSuppl=1 - Ardex | Imp_1=1 - 1 1 1
2 <NA> OSuppl=1 - Ardex | Imp_1=2 - 2 2 1
3 e7f0940c64001d4ab9d43ebd1e361292 OSuppl=1 - Ardex | Imp_1=3 - 3 3 1
4 <NA> OSuppl=1 - Ardex | Imp_1=4 - 4 4 1
5 2de771a03f49ce72eb721159933d4827 OSuppl=1 - Ardex | Imp_1=5 - 5 5 1
6 307ad612c3cc9fe5741c1fe75d1bc217 OSuppl=1 - Ardex | Imp_1=5 - 5 5 1
7 522f594612678f13f9dd5ee8f4f24df7 OSuppl=1 - Ardex | Imp_1=5 - 5 5 1
8 c1c32ac37f572fb259fe4e454bbdf743 OSuppl=1 - Ardex | Imp_1=5 - 5 5 1
9 d5b784d8f9508da7ac9573b535fe7147 OSuppl=1 - Ardex | Imp_1=5 - 5 5 1
10 e07439cdc15377d209413b31d9f80056 OSuppl=1 - Ardex | Imp_1=6 - 6 6 1
11 878a67dbbb428c65c83602fc112a24a0 OSuppl=1 - Ardex | Imp_1=6 - 6 6 1
12 5f7c27fb104685c26e53fc3267024539 OSuppl=1 - Ardex | Imp_1=7 - 7 7 1
13 6b12a3591d89f7b70587406a0c4f92bb OSuppl=1 - Ardex | Imp_1=7 - 7 7 1
14 7fb2f98867e0e100187f0b4f13baac46 OSuppl=1 - Ardex | Imp_1=7 - 7 7 1
15 99a0ffaa2066e5c4806f2e30a446a31f OSuppl=1 - Ardex | Imp_1=7 - 7 7 1
16 9d214544e8eaf3ea9c416a3dfbddb9f6 OSuppl=1 - Ardex | Imp_1=7 - 7 7 1
17 b36f990b1e0d8c5f04a47d23b70c1022 OSuppl=1 - Ardex | Imp_1=7 - 7 7 1
18 f2f9395bd9ddc16acd2253bd114aca64 OSuppl=1 - Ardex | Imp_1=7 - 7 7 1
19 4420e8499ab32631b389111935314468 OSuppl=1 - Ardex | Imp_1=8 - 8 8 1
...
Desired result extract example 所需结果提取示例
Var2 Var1 Freq
...
6 5 1 5
7 6 1 2
8 7 1 7
9 8 1 1
...
What sort of syntax would I need to get the example desired output? 我需要哪种语法来获得示例所需的输出?
library(plyr)
counts <- ddply(frequency.data.frame, .(frequency.data.frame$value_x, frequency.data.frame$value), nrow)
names(counts) <- c("value_x", "value", "Freq")
value_x value Freq
1 1 1 1
2 1 2 1
3 1 3 1
4 1 4 1
5 1 5 5
6 1 6 2
7 1 7 7
8 1 8 10
9 1 9 9
10 1 10 15
11 1 11 1
12 1 12 1
13 2 1 1
...
As we are only getting the frequency of 'value', 'value_x' based on the non-NA 'id', subset
based on the non-NA elements, select
the columns of interest, get the table
and convert to data.frame
由于我们只获得'值'的频率,基于非NA'id'的'value_x',基于非NA元素的
subset
, select
感兴趣的列,获取table
并转换为data.frame
as.data.frame(table(subset(frequency.data.frame,
select = c('value', 'value_x'), !is.na(id))))
The tidyverse
syntax for the above solution would be 以上解决方案的
tidyverse
语法为
library(dplyr)
frequency.data.frame %>%
filter(!is.na(id)) %>%
count(var1 = value, var2 = value_x)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.