简体   繁体   中英

Using a vector as a grep pattern

I am new to R. I am trying to search the columns using grep multiple times within an apply loop. I use grep to specify which rows are summed based on the vector individuals

individuals <-c("ID1","ID2".....n)
bcdata_total <- sapply(individuals, function(x) {
  apply(bcdata_clean[,grep(individuals, colnames(bcdata_clean))], 1, sum)
})

bcdata is of random size and contains random data but contains columns that have individuals in part of the string

>head(bcdata)
  ID1-4 ID1-3 ID2-5
A   3     2    1
B   2     2    3
C   4     5    5

grep(individuals[1],colnames(bcdata_clean)) returns a vector that looks like [1] 1 2 , a list of the column names containing ID1 . That vector is used to select columns to be summed in bcdata_clean . This should occur n number of times depending on the length of individuals

However this returns the error

In grep(individuals, colnames(bcdata)) :
  argument 'pattern' has length > 1 and only the first element will be used

And results in all the columns of bcdata being identical

Ideally individuals would increment each time the function is run like this for each iteration

 apply(bcdata_clean[,grep(individuals[1,2....n], colnames(bcdata_clean))], 1, sum)

and would result in something like this

>head(bcdata_total)
  ID1 ID2
A  5   1
B  4   3 
C  9   5

But I'm not sure how to increment individuals . What is the best way to do this within the function?

You can use split.default to split data on similarly named columns and sum them row-wise.

sapply(split.default(df, sub('-.*', '', names(df))), rowSums, na.rm. = TRUE)

#  ID1 ID2
#A   5   1
#B   4   3
#C   9   5

data

df <- structure(list(`ID1-4` = c(3L, 2L, 4L), `ID1-3` = c(2L, 2L, 5L
), `ID2-5` = c(1L, 3L, 5L)), class = "data.frame", row.names = c("A", "B", "C"))

Passing individuals as my argument in function(x) fixed my issue

bcdata_total <- sapply(individuals, function(individuals) {
  apply(bcdata_clean[,grep(individuals, colnames(bcdata_clean))], 1, sum)
})

An option with tidyverse

library(dplyr)
library(tidyr)
library(tibble)
df %>%
    rownames_to_column('rn') %>%
    pivot_longer(cols = -rn, names_to = c(".value", "grp"), names_sep="-") %>%
    group_by(rn) %>% 
    summarise(across(starts_with('ID'), sum, na.rm = TRUE), .groups = 'drop') %>%
    column_to_rownames('rn')
#  ID1 ID2
#A   5   1
#B   4   3
#C   9   5

data

df <- df <- structure(list(`ID1-4` = c(3L, 2L, 4L), `ID1-3` = c(2L, 2L, 5L
), `ID2-5` = c(1L, 3L, 5L)), class = "data.frame", row.names = c("A", "B", "C"))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM