R: count times per column a condition is met and row names appear in a list

Question

I have a dataframe with count information (df1)

rownames	sample1	sample2	sample3
m1	0	5	1
m2	1	7	5
m3	6	2	0
m4	3	1	0

and a second with sample information (df2)

rownames	batch	total count
sample1	a	10
sample2	b	15
sample3	a	6

I also have two lists with information about the m values (could easily be turned into another data frame if necessary but I would rather not add to the count information as it is quite large). No patterns (such as even and odd) exist, I am just using a very simplistic example

x <- c("m1", "m3") and y <- c("m2", "m4")

What I would like to do is add another two columns to the sample information. This is a count of each m per sample that has a value of above 5 and appears in list x or y

rownames	batch	total count	x	y
sample1	a	10	1	0
sample2	b	15	1	1
sample3	a	6	0	1

My current strategy is to make a list of values for both x and y and then append them to df2. Here are my attempts so far:

numX <- colSums(df1[sum(rownames(df1)>10 %in% x),]) and numX <- colSums(df1[sum(rownames(df1)>10 %in% x),]) both return a list of 0s

numX <- colSums(df1[rownames(df1)>10 %in% x,]) returns a list of the sum of count values meeting the conditions for each column

numX <- length(df1[rownames(df1)>10 %in% novel,]) returns the number of times the condition is met (in this example 2L)

I am not really sure how to approach this so I have just been throwing around attempts. I've tried looking for answers but maybe I am just struggling to find the proper wording.

Answer 1

How about using using dplyr and reshape2::melt

df3 <- df1 %>%
  melt %>%
  filter(value >= 5) %>% 
  mutate(x = as.numeric(rownames %in% c("m1", "m3")),
         y = as.numeric(rownames %in% c("m2", "m4"))) %>%
  select(-rownames, - value) %>%
  group_by(variable) %>%
  summarise(x = sum(x), y = sum(y))

df2 %>% left_join(df3, by = c("rownames" = "variable"))

  rownames batch total_count x y
1  sample1     a          10 1 0
2  sample2     b          15 1 1
3  sample3     a           6 0 1

Answer 2

You can create a named list of vectors and for each rownames count how many values of x and y in the respective sample is >= 5 .

Base R option -

list_vec <- list(x = x, y = y)

cbind(df2, do.call(rbind, lapply(df2$rownames, function(x) 
  sapply(list_vec, function(y) {
    sum(df1[[x]][df1$rownames %in% y] >= 5)
}))))

#  rownames batch total.count x y
#1  sample1     a          10 1 0
#2  sample2     b          15 1 1
#3  sample3     a           6 0 1

Using tidyverse -

library(dplyr)
library(purrr)

list_vec <- lst(x, y)

df2 %>%
  bind_cols(map_df(df2$rownames, function(x) 
    map(list_vec, ~sum(df1[[x]][df1$rownames %in% .x] >= 5))))

Answer 3

We may do this with rowwise

library(dplyr)
df2 %>% 
   rowwise %>%
    mutate(x = +(sum(df1[[rownames]][df1$rownames %in% x]) >= 5), 
           y = +(sum(df1[[rownames]][df1$rownames %in% y]) >= 5)) %>%
    ungroup

-output

# A tibble: 3 × 5
  rownames batch totalcount     x     y
  <chr>    <chr>      <int> <int> <int>
1 sample1  a             10     1     0
2 sample2  b             15     1     1
3 sample3  a              6     0     1

R: count times per column a condition is met and row names appear in a list

Question

3 answers

solution1
0 2021-10-19 01:11:27

solution2
0 2021-10-19 01:29:37

solution3
0 2021-10-19 02:10:55

R: count times per column a condition is met and row names appear in a list

Question

3 answers

solution1 0 2021-10-19 01:11:27

solution2 0 2021-10-19 01:29:37

solution3 0 2021-10-19 02:10:55

solution1
0 2021-10-19 01:11:27

solution2
0 2021-10-19 01:29:37

solution3
0 2021-10-19 02:10:55