简体   繁体   English

用于从 csv 中提取字符串的正则表达式

[英]Regex for extracting string from csv before numbers

I'm very new to the regex world and would like to know how to extract strings using regex from a bunch of file names I've imported to R.我对正则表达式世界很陌生,想知道如何使用正则表达式从我导入到 R 的一堆文件名中提取字符串。 My files follow the general format of:我的文件遵循以下一般格式:

testing1_010000.csv
check3_012000.csv
testing_checking_045880.csv
test_check2_350000.csv

And I'd like to extract everything before the 6 numbers.csv part, including the "_" to get something like:我想提取 6 个数字之前的所有内容。csv 部分,包括“_”以获得类似:

testing1_
check3_
testing_checking_
test_check2_

If it helps, the pattern I essentially want to remove will always be 6 numbers immediately followed by.csv.如果有帮助,我本质上要删除的模式将始终是 6 个数字,后面紧跟 .csv。

Any help would be great, thank you!任何帮助都会很棒,谢谢!

There's a few ways you could go about this.有几种方法可以解决这个问题。 For example, match anything before a string of six digits followed by ".csv".例如,匹配六位数字字符串前的任何内容,后跟“.csv”。 For this one you would want to get the first capturing group.对于这个,您需要获得第一个捕获组。

/(.*)\d{6}.csv/

https://regex101.com/r/MPH6mE/1/ https://regex101.com/r/MPH6mE/1/

Or match everything up to the last underscore character.或者将所有内容匹配到最后一个下划线字符。 For this one you would want the whole match.对于这个,你会想要整场比赛。

.*_

https://regex101.com/r/4GFPIA/1 https://regex101.com/r/4GFPIA/1

Files = c("testing1_010000.csv", "check3_012000.csv",
    "testing_checking_045880.csv", "test_check2_350000.csv")
sub("(.*_)[[:digit:]]{6}.*", "\\1", Files)

 
[1] "testing1_"         "check3_"           "testing_checking_"
[4] "test_check2_"

Using nchar :使用nchar

Files = c("testing1_010000.csv", "check3_012000.csv",
          "testing_checking_045880.csv", "test_check2_350000.csv")

substr(Files, 1, nchar(Files)-10)
[1] "testing1_"         "check3_"           "testing_checking_"
[4] "test_check2_"

We can use stringr::str_match() .我们可以使用stringr::str_match() It will also work for different that six digits.它也适用于不同的六位数。

library(tidyverse)

files <- c("testing1_010000.csv", "check3_012000.csv", "testing_checking_045880.csv", "test_check2_350000.csv")



str_match(files, '(.*_)\\d+\\.csv$')[, 2]
#> [1] "testing1_"         "check3_"           "testing_checking_"
#> [4] "test_check2_"

The regex can be interpreted as: "capture everything before and including an underscore, that is then followed by one or more digits.csv as an ending"正则表达式可以解释为:“捕获所有内容,包括下划线,然后是一个或多个数字。csv 作为结尾”

Created on 2021-12-03 by the reprex package (v2.0.1)reprex package (v2.0.1) 于 2021 年 12 月 3 日创建

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM