用于从 csv 中提取字符串的正则表达式

Question

I'm very new to the regex world and would like to know how to extract strings using regex from a bunch of file names I've imported to R.我对正则表达式世界很陌生，想知道如何使用正则表达式从我导入到 R 的一堆文件名中提取字符串。 My files follow the general format of:我的文件遵循以下一般格式：

testing1_010000.csv
check3_012000.csv
testing_checking_045880.csv
test_check2_350000.csv

And I'd like to extract everything before the 6 numbers.csv part, including the "_" to get something like:我想提取 6 个数字之前的所有内容。csv 部分，包括“_”以获得类似：

testing1_
check3_
testing_checking_
test_check2_

If it helps, the pattern I essentially want to remove will always be 6 numbers immediately followed by.csv.如果有帮助，我本质上要删除的模式将始终是 6 个数字，后面紧跟 .csv。

Any help would be great, thank you!任何帮助都会很棒，谢谢！

Answer 1

There's a few ways you could go about this.有几种方法可以解决这个问题。 For example, match anything before a string of six digits followed by ".csv".例如，匹配六位数字字符串前的任何内容，后跟“.csv”。 For this one you would want to get the first capturing group.对于这个，您需要获得第一个捕获组。

/(.*)\d{6}.csv/

https://regex101.com/r/MPH6mE/1/ https://regex101.com/r/MPH6mE/1/

Or match everything up to the last underscore character.或者将所有内容匹配到最后一个下划线字符。 For this one you would want the whole match.对于这个，你会想要整场比赛。

.*_

https://regex101.com/r/4GFPIA/1 https://regex101.com/r/4GFPIA/1

Answer 2

Files = c("testing1_010000.csv", "check3_012000.csv",
    "testing_checking_045880.csv", "test_check2_350000.csv")
sub("(.*_)[[:digit:]]{6}.*", "\\1", Files)

 
[1] "testing1_"         "check3_"           "testing_checking_"
[4] "test_check2_"

Answer 3

Using nchar :使用nchar ：

Files = c("testing1_010000.csv", "check3_012000.csv",
          "testing_checking_045880.csv", "test_check2_350000.csv")

substr(Files, 1, nchar(Files)-10)

[1] "testing1_"         "check3_"           "testing_checking_"
[4] "test_check2_"

Answer 4

We can use stringr::str_match() .我们可以使用stringr::str_match() 。 It will also work for different that six digits.它也适用于不同的六位数。

library(tidyverse)

files <- c("testing1_010000.csv", "check3_012000.csv", "testing_checking_045880.csv", "test_check2_350000.csv")



str_match(files, '(.*_)\\d+\\.csv$')[, 2]
#> [1] "testing1_"         "check3_"           "testing_checking_"
#> [4] "test_check2_"

The regex can be interpreted as: "capture everything before and including an underscore, that is then followed by one or more digits.csv as an ending"正则表达式可以解释为：“捕获所有内容，包括下划线，然后是一个或多个数字。csv 作为结尾”

^{Created on 2021-12-03 by the reprex package (v2.0.1)}^{由reprex package (v2.0.1) 于 2021 年 12 月 3 日创建}

用于从 csv 中提取字符串的正则表达式

问题描述

4 个解决方案

解决方案1
3 已采纳 2021-12-03 20:26:22

解决方案2
2 2021-12-03 20:25:34

解决方案3
1 2021-12-03 20:45:21

解决方案4
0 2021-12-03 20:55:35

用于从 csv 中提取字符串的正则表达式

问题描述

4 个解决方案

解决方案1 3 已采纳 2021-12-03 20:26:22

解决方案2 2 2021-12-03 20:25:34

解决方案3 1 2021-12-03 20:45:21

解决方案4 0 2021-12-03 20:55:35

解决方案1
3 已采纳 2021-12-03 20:26:22

解决方案2
2 2021-12-03 20:25:34

解决方案3
1 2021-12-03 20:45:21

解决方案4
0 2021-12-03 20:55:35