简体   繁体   中英

Using dplyr collapse rows taking condition from another numeric column

An example df:

experiment = c("A", "A", "A", "A", "A", "B", "B", "B")
count = c(1,2,3,4,5,1,2,1)
df = cbind.data.frame(experiment, count)

Desired output:

experiment_1 = c("A", "A", "A", "B", "B")
freq = c(1,1,3,2,1) # frequency
freq_per = c(20,20,60,66.6,33.3) # frequency percent
df_1 = cbind.data.frame(experiment_1, freq, freq_per)

I want to do the following:

    1. Group df using experiment
    2. Calculate freq using the count column
    3. Calculate freq_per
    4. Calculate sum of freq_per for all observations with count >= 3

I have the following code. How do I do the step 4?

freq_count = df %>% dplyr::group_by(experiment, count) %>% summarize(freq=n()) %>% na.omit() %>% mutate(freq_per=freq/sum(freq)*100)

Thank you very much.

There may be a more concise approach but I would suggest collapsing your count in a new column using mutate() and ifelse() and then summarising:

freq_count %>%
  mutate(collapsed_count = ifelse(count >= 3, 3, count)) %>%
  group_by(collapsed_count, add = TRUE) %>%       # adds a 2nd grouping var 
  summarise(freq = sum(freq), freq_per = (sum(freq_per))) %>% 
  select(-collapsed_count)       # dropped to match your df_1. 

Also, just fyi, for step 2 you might consider the count() function if you're keen to save some keystrokes. Also tibble() or data.frame() are likely better options than calling the dataframe method of cbind explicitly to create a data frame.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM