I am trying to figure out how to generate single values from a large data set so that I can input the values into a table in R Markdown.
For example, my original data set looks something like this:
ID Occupation OnTime
1 1 A Y
2 2 B N
3 3 B N
4 4 A Y
5 5 D Y
6 6 C Y
7 7 C N
and I would like a table like this that gives a count by occupation:
Occupation Total OnTime Percent
1 A 2 2 100
2 B 2 0 0
3 C 2 1 50
4 D 1 1 100
The Total column in the second data frame sums up the numbers of each occupation, the OnTime column tallies up the numbers in occupation A
that were a Y
in OnTime
.
Because I am manually creating this table in R markdown, I am needing to create each of the values individually to input them into a R Markdown table like this:
Occupation | Total | OnTime | Percent
-----------|--------|---------|--------
A | TotalA | OnTimeA | PercentOnTimeA
B | TotalB | OnTimeB | PercentOnTimeB
C | TotalC | OnTimeC | PercentOnTimeC
D | TotalD | OnTimeD | PercentOnTimeD
How do I do this efficiently using a loop? So far I have come up with this:
for (i in unique(df$Occupation)) {
df2names <- paste("df1", i,sep=".")
assign(df2names, df1[df1$Occupation==i,])
}
I need an extra line in the code above that counts the length of each of the data frames I've just produced so that I have values to input for TotalA, TotalB, TotalC and TotalD. I would then use similar for loops to generate the OnTime and Percent columns in the R Markdown table.
How would I go about doing this? I would also appreciate other approaches to this problem. Thank you!
We can do a group_by/summarise
to get the summarised values
library(dplyr)
df1 %>%
group_by(Occupation) %>%
summarise(Total = n(), OnTime = sum(OnTime == "Y"), Percent = 100 *OnTime/n())
# A tibble: 4 x 4
# Occupation Total OnTime Percent
# <chr> <int> <int> <dbl>
#1 A 2 2 100
#2 B 2 0 0
#3 C 2 1 50
#4 D 1 1 100
Or using base R
tbl <- table(df1[-1])
cbind(addmargins(tbl, 2), Percent = 100 * prop.table(tbl, 1))[, -c(1, 4)]
df1 <- structure(list(ID = 1:7, Occupation = c("A", "B", "B", "A", "D",
"C", "C"), OnTime = c("Y", "N", "N", "Y", "Y", "Y", "N")),
class = "data.frame", row.names = c("1", "2", "3", "4", "5", "6", "7"))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.