Using dplyr collapse rows taking condition from another numeric column

Question

An example df:

experiment = c("A", "A", "A", "A", "A", "B", "B", "B")
count = c(1,2,3,4,5,1,2,1)
df = cbind.data.frame(experiment, count)

Desired output:

experiment_1 = c("A", "A", "A", "B", "B")
freq = c(1,1,3,2,1) # frequency
freq_per = c(20,20,60,66.6,33.3) # frequency percent
df_1 = cbind.data.frame(experiment_1, freq, freq_per)

I want to do the following:

1. Group df using experiment
2. Calculate freq using the count column
3. Calculate freq_per
4. Calculate sum of freq_per for all observations with count >= 3

I have the following code. How do I do the step 4?

freq_count = df %>% dplyr::group_by(experiment, count) %>% summarize(freq=n()) %>% na.omit() %>% mutate(freq_per=freq/sum(freq)*100)

Thank you very much.

Answer 1

There may be a more concise approach but I would suggest collapsing your count in a new column using mutate() and ifelse() and then summarising:

freq_count %>%
  mutate(collapsed_count = ifelse(count >= 3, 3, count)) %>%
  group_by(collapsed_count, add = TRUE) %>%       # adds a 2nd grouping var 
  summarise(freq = sum(freq), freq_per = (sum(freq_per))) %>% 
  select(-collapsed_count)       # dropped to match your df_1.

Also, just fyi, for step 2 you might consider the count() function if you're keen to save some keystrokes. Also tibble() or data.frame() are likely better options than calling the dataframe method of cbind explicitly to create a data frame.

Using dplyr collapse rows taking condition from another numeric column

Question

1 answers

solution1
1 ACCPTED 2018-06-29 07:13:39

Using dplyr collapse rows taking condition from another numeric column

Question

1 answers

solution1 1 ACCPTED 2018-06-29 07:13:39

solution1
1 ACCPTED 2018-06-29 07:13:39