简体   繁体   English

使用向量作为 grep 模式

[英]Using a vector as a grep pattern

I am new to R. I am trying to search the columns using grep multiple times within an apply loop.我是 R 新手。我试图在apply循环中多次使用grep搜索列。 I use grep to specify which rows are summed based on the vector individuals我使用grep来指定根据向量individuals对哪些行进行求和

individuals <-c("ID1","ID2".....n)
bcdata_total <- sapply(individuals, function(x) {
  apply(bcdata_clean[,grep(individuals, colnames(bcdata_clean))], 1, sum)
})

bcdata is of random size and contains random data but contains columns that have individuals in part of the string bcdata是随机大小并包含随机数据,但包含在字符串的一部分中包含individuals

>head(bcdata)
  ID1-4 ID1-3 ID2-5
A   3     2    1
B   2     2    3
C   4     5    5

grep(individuals[1],colnames(bcdata_clean)) returns a vector that looks like [1] 1 2 , a list of the column names containing ID1 . grep(individuals[1],colnames(bcdata_clean))返回一个看起来像[1] 1 2的向量,一个包含ID1的列名列表。 That vector is used to select columns to be summed in bcdata_clean .该向量用于选择要在bcdata_clean求和的bcdata_clean This should occur n number of times depending on the length of individuals这应该发生n次,具体取决于individuals的长度

However this returns the error但是这会返回错误

In grep(individuals, colnames(bcdata)) :
  argument 'pattern' has length > 1 and only the first element will be used

And results in all the columns of bcdata being identical并导致bcdata所有列都相同

Ideally individuals would increment each time the function is run like this for each iteration理想情况下,每次迭代运行函数时, individuals都会增加

 apply(bcdata_clean[,grep(individuals[1,2....n], colnames(bcdata_clean))], 1, sum)

and would result in something like this并会导致这样的事情

>head(bcdata_total)
  ID1 ID2
A  5   1
B  4   3 
C  9   5

But I'm not sure how to increment individuals .但我不确定如何增加individuals What is the best way to do this within the function?在函数中执行此操作的最佳方法是什么?

You can use split.default to split data on similarly named columns and sum them row-wise.您可以使用split.default在名称相似的列上拆分数据并按行对它们求和。

sapply(split.default(df, sub('-.*', '', names(df))), rowSums, na.rm. = TRUE)

#  ID1 ID2
#A   5   1
#B   4   3
#C   9   5

data数据

df <- structure(list(`ID1-4` = c(3L, 2L, 4L), `ID1-3` = c(2L, 2L, 5L
), `ID2-5` = c(1L, 3L, 5L)), class = "data.frame", row.names = c("A", "B", "C"))

Passing individuals as my argument in function(x) fixed my issueindividuals作为我在function(x)参数解决了我的问题

bcdata_total <- sapply(individuals, function(individuals) {
  apply(bcdata_clean[,grep(individuals, colnames(bcdata_clean))], 1, sum)
})

An option with tidyverse tidyverse一个选项

library(dplyr)
library(tidyr)
library(tibble)
df %>%
    rownames_to_column('rn') %>%
    pivot_longer(cols = -rn, names_to = c(".value", "grp"), names_sep="-") %>%
    group_by(rn) %>% 
    summarise(across(starts_with('ID'), sum, na.rm = TRUE), .groups = 'drop') %>%
    column_to_rownames('rn')
#  ID1 ID2
#A   5   1
#B   4   3
#C   9   5

data数据

df <- df <- structure(list(`ID1-4` = c(3L, 2L, 4L), `ID1-3` = c(2L, 2L, 5L
), `ID2-5` = c(1L, 3L, 5L)), class = "data.frame", row.names = c("A", "B", "C"))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM