简体   繁体   中英

Creating single values from large dataset using for loop in R

I am trying to figure out how to generate single values from a large data set so that I can input the values into a table in R Markdown.

For example, my original data set looks something like this:

  ID Occupation OnTime
1  1          A      Y
2  2          B      N
3  3          B      N
4  4          A      Y
5  5          D      Y
6  6          C      Y
7  7          C      N

and I would like a table like this that gives a count by occupation:

  Occupation Total OnTime Percent
1          A     2      2     100
2          B     2      0       0
3          C     2      1      50
4          D     1      1     100

The Total column in the second data frame sums up the numbers of each occupation, the OnTime column tallies up the numbers in occupation A that were a Y in OnTime .

Because I am manually creating this table in R markdown, I am needing to create each of the values individually to input them into a R Markdown table like this:

Occupation |  Total | OnTime  | Percent
-----------|--------|---------|--------
A          | TotalA | OnTimeA | PercentOnTimeA
B          | TotalB | OnTimeB | PercentOnTimeB
C          | TotalC | OnTimeC | PercentOnTimeC
D          | TotalD | OnTimeD | PercentOnTimeD

How do I do this efficiently using a loop? So far I have come up with this:

for (i in unique(df$Occupation)) {
  df2names <- paste("df1", i,sep=".")
  assign(df2names, df1[df1$Occupation==i,])
}

I need an extra line in the code above that counts the length of each of the data frames I've just produced so that I have values to input for TotalA, TotalB, TotalC and TotalD. I would then use similar for loops to generate the OnTime and Percent columns in the R Markdown table.

How would I go about doing this? I would also appreciate other approaches to this problem. Thank you!

We can do a group_by/summarise to get the summarised values

library(dplyr)
df1  %>%
   group_by(Occupation) %>% 
   summarise(Total = n(), OnTime = sum(OnTime == "Y"), Percent = 100 *OnTime/n())
# A tibble: 4 x 4
#  Occupation Total OnTime Percent
#  <chr>      <int>  <int>   <dbl>
#1 A              2      2     100
#2 B              2      0       0
#3 C              2      1      50
#4 D              1      1     100

Or using base R

tbl <- table(df1[-1])
cbind(addmargins(tbl, 2), Percent = 100 * prop.table(tbl, 1))[, -c(1, 4)]

data

df1 <- structure(list(ID = 1:7, Occupation = c("A", "B", "B", "A", "D", 
 "C", "C"), OnTime = c("Y", "N", "N", "Y", "Y", "Y", "N")),
 class = "data.frame", row.names = c("1", "2", "3", "4", "5", "6", "7"))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM