I have this data frame and I'm hoping to get the frequency as a fraction of each event by cluster. For example since E2 occurs 2 times in C2 and there are 4 events for C2, the fraction would be 0.5.
data <- data.frame(Event=c("E1", "E2", "E2","E3", "E4"), Cluster=c("C1", "C2", "C2", "C2", "C2"))
Event Cluster
E1 C1
E2 C2
E2 C2
E3 C2
E4 C2
This is the output I want:
Event Cluster Freq
E1 C1 1
E2 C2 0.5
E3 C2 0.25
E4 C2 0.25
Using dplyr
, we can count
each level of Cluster
and Event
and then calculate the ratio for each Cluster
.
library(dplyr)
data %>%
count(Cluster, Event, name = "Freq") %>%
group_by(Cluster) %>%
mutate(Freq = Freq/sum(Freq))
# Cluster Event Freq
# <fct> <fct> <dbl>
#1 C1 E1 1
#2 C2 E2 0.5
#3 C2 E3 0.25
#4 C2 E4 0.25
In base R we can use table
and prop.table
which will have the same information but different output format.
prop.table(table(data), 2)
# Cluster
#Event C1 C2
# E1 1.00 0.00
# E2 0.00 0.50
# E3 0.00 0.25
# E4 0.00 0.25
Here is another solution, using base R:
data2 = unsplit(lapply(split(data, data$Cluster), function(df) {
df$Freq = nrow(df)
df
}), data$Cluster)
aggregate(data2[,"Freq", drop=FALSE], data2[c("Event","Cluster")],
function(x) length(x)/x[1])
## Event Cluster Freq
## 1 E1 C1 1.00
## 2 E2 C2 0.50
## 3 E3 C2 0.25
## 4 E4 C2 0.25
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.