使用來自另一個列表的部分字符串匹配的列名的 R 子集 data.frame

Question

我有一個像這樣的數據框（稱為“myfile”）：

      P3170.Tp2  P3189.Tn10 C453.Tn7 F678.Tc23 P3170.Tn10
gene1 0.3035130  0.5909081 0.8918271 0.2623648 0.13392672
gene2 0.2542919  0.5797730 0.4226669 0.9091961 0.96056308
gene3 0.9923911  0.4318736 0.7020107 0.1936181 0.58723105
gene4 0.4113318  0.1239206 0.4091794 0.8196982 0.54791214
gene5 0.4095719  0.6392045 0.4416208 0.8853356 0.01008299

我有一個有趣的字符串列表（稱為“interesting.list”），如下所示：

interesting.list <- c("P3170", "C453")

我想使用這個interesting.list 並通過列標題的部分字符串匹配來對myfile 進行子集。

ss.file <- NULL
for (i in 1:length(interesting.list)){
    ss.file[[i]] <- myfile[,colnames(myfile) %like% interesting.list[[i]]]
}

但是，此循環在運行后不提供列標題。 由於我有一個龐大的數據集（超過 30000 行），因此很難手動實現列名。 有沒有更好的方法來做到這一點？

Answer 1

# Specify `interesting.list` items manually
df[,grep("P3170|C453", x=names(df))]
#>   P3170.Tp2 C453.Tn7 P3170.Tn10
#> 1         1        3          5

# Use paste to create pattern from lots of items in `interesting.list`
il <- c("P3170", "C453")
df[,grep(paste(il, collapse = "|"), x=names(df))]
#>   P3170.Tp2 C453.Tn7 P3170.Tn10
#> 1         1        3          5

示例數據：

n <- c("P3170.Tp2" , "P3189.Tn10" ,"C453.Tn7" ,"F678.Tc23" ,"P3170.Tn10")
df <- data.frame(1,2,3,4,5)
names(df) <- n
Created on 2021-10-20 by the reprex package (v2.0.1)

Answer 2

除了這個問題，你還需要考慮很多事情； 如果interesting.list的項目返回多個匹配項怎么辦，如果沒有找到匹配項怎么辦，等等。

鑒於您的數據，這是一種方法：

nms <- colnames(myFile)

matchIdx <- unlist(lapply(interesting.list, function(pattern) {
  matches <- which(grepl(pattern, nms, fixed = TRUE))

  # If more than one match is found, only return the first
  if (length(matches) > 1) matches[1] else matches
}))

myFile[, matchIdx, drop = FALSE]

使用來自另一個列表的部分字符串匹配的列名的 R 子集 data.frame

問題描述

2 個解決方案

解決方案1
1 2021-10-20 19:03:26

解決方案2
0 2021-10-20 19:10:08

使用來自另一個列表的部分字符串匹配的列名的 R 子集 data.frame

問題描述

2 個解決方案

解決方案1 1 2021-10-20 19:03:26

解決方案2 0 2021-10-20 19:10:08

解決方案1
1 2021-10-20 19:03:26

解決方案2
0 2021-10-20 19:10:08