简体   繁体   English

使用来自另一个列表的部分字符串匹配的列名的 R 子集 data.frame

[英]R subset data.frame by column names using partial string match from another list

I have a dataframe (called "myfile") like this:我有一个像这样的数据框(称为“myfile”):

      P3170.Tp2  P3189.Tn10 C453.Tn7 F678.Tc23 P3170.Tn10
gene1 0.3035130  0.5909081 0.8918271 0.2623648 0.13392672
gene2 0.2542919  0.5797730 0.4226669 0.9091961 0.96056308
gene3 0.9923911  0.4318736 0.7020107 0.1936181 0.58723105
gene4 0.4113318  0.1239206 0.4091794 0.8196982 0.54791214
gene5 0.4095719  0.6392045 0.4416208 0.8853356 0.01008299

I have a list of interesting strings (called "interesting.list") like this:我有一个有趣的字符串列表(称为“interesting.list”),如下所示:

interesting.list <- c("P3170", "C453")

I would like to use this interesting.list and subset the myfile by partial string match of column headers.我想使用这个interesting.list 并通过列标题的部分字符串匹配来对myfile 进行子集。

ss.file <- NULL
for (i in 1:length(interesting.list)){
    ss.file[[i]] <- myfile[,colnames(myfile) %like% interesting.list[[i]]]
}

However, this loop doesnt provide the column headers after running.但是,此循环在运行后不提供列标题。 Since I have a huge dataset (more than 30000 rows), it would be hard to implement the colnames manually.由于我有一个庞大的数据集(超过 30000 行),因此很难手动实现列名。 is there a better way to do it?有没有更好的方法来做到这一点?

# Specify `interesting.list` items manually
df[,grep("P3170|C453", x=names(df))]
#>   P3170.Tp2 C453.Tn7 P3170.Tn10
#> 1         1        3          5

# Use paste to create pattern from lots of items in `interesting.list`
il <- c("P3170", "C453")
df[,grep(paste(il, collapse = "|"), x=names(df))]
#>   P3170.Tp2 C453.Tn7 P3170.Tn10
#> 1         1        3          5

Example data:示例数据:

n <- c("P3170.Tp2" , "P3189.Tn10" ,"C453.Tn7" ,"F678.Tc23" ,"P3170.Tn10")
df <- data.frame(1,2,3,4,5)
names(df) <- n
Created on 2021-10-20 by the reprex package (v2.0.1)

There are multiple things you need to think about on top of this question;除了这个问题,你还需要考虑很多事情; what if an item in interesting.list returns more than one match, what if no matches are found, etc.如果interesting.list的项目返回多个匹配项怎么办,如果没有找到匹配项怎么办,等等。

Here's one approach, given your data:鉴于您的数据,这是一种方法:

nms <- colnames(myFile)

matchIdx <- unlist(lapply(interesting.list, function(pattern) {
  matches <- which(grepl(pattern, nms, fixed = TRUE))

  # If more than one match is found, only return the first
  if (length(matches) > 1) matches[1] else matches
}))

myFile[, matchIdx, drop = FALSE]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM