简体   繁体   English

如何在R中按名称模式删除列?

[英]How to drop columns by name pattern in R?

I have this dataframe:我有这个数据框:

state county city  region  mmatrix  X1 X2 X3    A1     A2     A3      B1     B2     B3      C1      C2      C3

  1      1     1      1     111010   1  0  0     2     20    200       Push      8     12      NA      NA      NA
  1      2     1      1     111010   1  0  0     4     NA    400       Shove      9     NA 

Now I want to exclude columns whose names end with a certain string, say "1" (ie A1 and B1).现在我想排除名称以某个字符串结尾的列,比如“1”(即 A1 和 B1)。 I wrote this code:我写了这段代码:

df_redacted <- df[, -grep("\\1$", colnames(df))]

However, this seems to delete every column.但是,这似乎删除了每一列。 How can I modify the code so that it only deletes the columns that matches the pattern (ie ends with "3" or any other string)?如何修改代码,使其仅删除与模式匹配的列(即以“3”或任何其他字符串结尾)?

The solution has to be able to handle a dataframe with has both numerical and categorical values.解决方案必须能够处理具有数值和分类值的数据帧。

I found a simple answer using dplyr / tidyverse .我使用dplyr / tidyverse找到了一个简单的答案。 If your colnames contain "This", then all variables containing "This" will be dropped.如果您的colnames包含“This”,则所有包含“This”的变量都将被删除。

library(dplyr) 
df_new <- df %>% select(-contains("This"))

Your code works like a charm if I apply it to a minimal example and just search for the string "A":如果我将它应用到一个最小的例子并且只搜索字符串“A”,你的代码就像一个魅力:

df <- data.frame(ID = 1:10,
                 A1 = rnorm(10),
                 A2 = rnorm(10),
                 B1 = letters[1:10],
                 B2 = letters[11:20])
df[, -grep("A", colnames(df))]

So your problem is more a regular expression problem, not how to drop columns.所以你的问题更像是一个正则表达式问题,而不是如何删除列。 If I run your code, I get an error:如果我运行你的代码,我会收到一个错误:

df[, -grep("\\3$", colnames(df))]
Error in grep("\\3$", colnames(df)) : 
  invalid regular expression '\3$', reason 'Invalid back reference'

Update: Why don't you just use this following expression?更新:为什么不直接使用以下表达式?

df[, -grep("1$", colnames(df))]
   ID         A2 B2
1   1  2.0957940  k
2   2 -1.7177042  l
3   3 -0.0448357  m
4   4  1.2899925  n
5   5  0.7569659  o
6   6 -0.5048024  p
7   7  0.6929080  q
8   8 -0.5116399  r
9   9 -1.2621066  s
10 10  0.7664955  t

Just as an additional answer, since I stumbled across this, when looking for the data.table solution to this problem.作为一个额外的答案,因为我在寻找这个问题的data.table解决方案时偶然发现了这一点。

library(data.table)
dt <- data.table(df)
drop.cols <- grep("1$", colnames(dt))
dt[, (drop.cols) := NULL]

For excluding any string you can use...要排除任何字符串,您可以使用...

 # Search string to exclude
 strng <- "1"
 df <- data.frame(matrix(runif(25,max=10),nrow=5))
 colnames(df) <- paste( "EX" , 1:5 )
 df_red <- df[, -( grep(paste0( strng , "$" ) , colnames(df),perl = TRUE) ) ]

    df
#         EX 1     EX 2        EX 3     EX 4     EX 5
#   1 7.332913 4.972780 1.175947853 6.428073 8.625763
#   2 2.730271 3.734072 6.031157537 1.305951 8.012606
#   3 9.450122 3.259247 2.856123205 5.067294 7.027795
#   4 9.682430 5.295177 0.002015966 9.322912 7.424568
#   5 1.225359 1.577659 4.013616377 5.092042 5.130887

    df_red
#         EX 2        EX 3     EX 4     EX 5
#   1 4.972780 1.175947853 6.428073 8.625763
#   2 3.734072 6.031157537 1.305951 8.012606
#   3 3.259247 2.856123205 5.067294 7.027795
#   4 5.295177 0.002015966 9.322912 7.424568
#   5 1.577659 4.013616377 5.092042 5.130887

You can expand it further using regex for a broader pattern search.您可以使用正则表达式进一步扩展它以进行更广泛的模式搜索。 I have a data frame that has a bunch of columns with "name" , "upper_name" and "lower_name"` as they represent confidence intervals for a bunch of series, but I don't need them all.我有一个数据框,它有一堆带有"name" 、 "upper_name" and "lower_name"` 的列,因为它们代表了一系列系列的置信区间,但我不需要它们。 So, using regex, you can do the following:因此,使用正则表达式,您可以执行以下操作:

pattern = "(upper_[a-z]*)|(lower_[a-z]*)"
policyData <- policyData[, -grep(pattern = pattern, colnames(policyData))]

The "|" “|” allows me to include an or statement in the regex so I can do it once with a single patter rather than look for each pattern.允许我在正则表达式中包含一个 or 语句,这样我就可以用一个模式执行一次,而不是查找每个模式。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM