使用来自另一个列表的部分字符串匹配的列名的 R 子集 data.frame

Question

I have a dataframe (called "myfile") like this:我有一个像这样的数据框（称为“myfile”）：

      P3170.Tp2  P3189.Tn10 C453.Tn7 F678.Tc23 P3170.Tn10
gene1 0.3035130  0.5909081 0.8918271 0.2623648 0.13392672
gene2 0.2542919  0.5797730 0.4226669 0.9091961 0.96056308
gene3 0.9923911  0.4318736 0.7020107 0.1936181 0.58723105
gene4 0.4113318  0.1239206 0.4091794 0.8196982 0.54791214
gene5 0.4095719  0.6392045 0.4416208 0.8853356 0.01008299

I have a list of interesting strings (called "interesting.list") like this:我有一个有趣的字符串列表（称为“interesting.list”），如下所示：

interesting.list <- c("P3170", "C453")

I would like to use this interesting.list and subset the myfile by partial string match of column headers.我想使用这个interesting.list 并通过列标题的部分字符串匹配来对myfile 进行子集。

ss.file <- NULL
for (i in 1:length(interesting.list)){
    ss.file[[i]] <- myfile[,colnames(myfile) %like% interesting.list[[i]]]
}

However, this loop doesnt provide the column headers after running.但是，此循环在运行后不提供列标题。 Since I have a huge dataset (more than 30000 rows), it would be hard to implement the colnames manually.由于我有一个庞大的数据集（超过 30000 行），因此很难手动实现列名。 is there a better way to do it?有没有更好的方法来做到这一点？

Answer 1

# Specify `interesting.list` items manually
df[,grep("P3170|C453", x=names(df))]
#>   P3170.Tp2 C453.Tn7 P3170.Tn10
#> 1         1        3          5

# Use paste to create pattern from lots of items in `interesting.list`
il <- c("P3170", "C453")
df[,grep(paste(il, collapse = "|"), x=names(df))]
#>   P3170.Tp2 C453.Tn7 P3170.Tn10
#> 1         1        3          5

Example data:示例数据：

n <- c("P3170.Tp2" , "P3189.Tn10" ,"C453.Tn7" ,"F678.Tc23" ,"P3170.Tn10")
df <- data.frame(1,2,3,4,5)
names(df) <- n
Created on 2021-10-20 by the reprex package (v2.0.1)

Answer 2

There are multiple things you need to think about on top of this question;除了这个问题，你还需要考虑很多事情； what if an item in interesting.list returns more than one match, what if no matches are found, etc.如果interesting.list的项目返回多个匹配项怎么办，如果没有找到匹配项怎么办，等等。

Here's one approach, given your data:鉴于您的数据，这是一种方法：

nms <- colnames(myFile)

matchIdx <- unlist(lapply(interesting.list, function(pattern) {
  matches <- which(grepl(pattern, nms, fixed = TRUE))

  # If more than one match is found, only return the first
  if (length(matches) > 1) matches[1] else matches
}))

myFile[, matchIdx, drop = FALSE]

使用来自另一个列表的部分字符串匹配的列名的 R 子集 data.frame

问题描述

2 个解决方案

解决方案1
1 2021-10-20 19:03:26

解决方案2
0 2021-10-20 19:10:08

使用来自另一个列表的部分字符串匹配的列名的 R 子集 data.frame

问题描述

2 个解决方案

解决方案1 1 2021-10-20 19:03:26

解决方案2 0 2021-10-20 19:10:08

解决方案1
1 2021-10-20 19:03:26

解决方案2
0 2021-10-20 19:10:08