[英]Regex for extracting string from csv before numbers
I'm very new to the regex world and would like to know how to extract strings using regex from a bunch of file names I've imported to R.我对正则表达式世界很陌生,想知道如何使用正则表达式从我导入到 R 的一堆文件名中提取字符串。 My files follow the general format of:我的文件遵循以下一般格式:
testing1_010000.csv
check3_012000.csv
testing_checking_045880.csv
test_check2_350000.csv
And I'd like to extract everything before the 6 numbers.csv part, including the "_" to get something like:我想提取 6 个数字之前的所有内容。csv 部分,包括“_”以获得类似:
testing1_
check3_
testing_checking_
test_check2_
If it helps, the pattern I essentially want to remove will always be 6 numbers immediately followed by.csv.如果有帮助,我本质上要删除的模式将始终是 6 个数字,后面紧跟 .csv。
Any help would be great, thank you!任何帮助都会很棒,谢谢!
There's a few ways you could go about this.有几种方法可以解决这个问题。 For example, match anything before a string of six digits followed by ".csv".例如,匹配六位数字字符串前的任何内容,后跟“.csv”。 For this one you would want to get the first capturing group.对于这个,您需要获得第一个捕获组。
/(.*)\d{6}.csv/
https://regex101.com/r/MPH6mE/1/ https://regex101.com/r/MPH6mE/1/
Or match everything up to the last underscore character.或者将所有内容匹配到最后一个下划线字符。 For this one you would want the whole match.对于这个,你会想要整场比赛。
.*_
https://regex101.com/r/4GFPIA/1 https://regex101.com/r/4GFPIA/1
Files = c("testing1_010000.csv", "check3_012000.csv",
"testing_checking_045880.csv", "test_check2_350000.csv")
sub("(.*_)[[:digit:]]{6}.*", "\\1", Files)
[1] "testing1_" "check3_" "testing_checking_"
[4] "test_check2_"
Using nchar
:使用nchar
:
Files = c("testing1_010000.csv", "check3_012000.csv",
"testing_checking_045880.csv", "test_check2_350000.csv")
substr(Files, 1, nchar(Files)-10)
[1] "testing1_" "check3_" "testing_checking_"
[4] "test_check2_"
We can use stringr::str_match()
.我们可以使用stringr::str_match()
。 It will also work for different that six digits.它也适用于不同的六位数。
library(tidyverse)
files <- c("testing1_010000.csv", "check3_012000.csv", "testing_checking_045880.csv", "test_check2_350000.csv")
str_match(files, '(.*_)\\d+\\.csv$')[, 2]
#> [1] "testing1_" "check3_" "testing_checking_"
#> [4] "test_check2_"
The regex can be interpreted as: "capture everything before and including an underscore, that is then followed by one or more digits.csv as an ending"正则表达式可以解释为:“捕获所有内容,包括下划线,然后是一个或多个数字。csv 作为结尾”
Created on 2021-12-03 by the reprex package (v2.0.1)由reprex package (v2.0.1) 于 2021 年 12 月 3 日创建
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.