简体   繁体   English

如何基于外壳脚本中的模式使用grep或sed grep文件名的一部分

[英]How to grep a portion of file name using grep or sed based on a pattern in shell script

I need to get a portion of file name based on a pattern. 我需要根据模式获取一部分文件名。 The file pattern here is not for checking if the file name matches the pattern exactly. 此处的文件模式不用于检查文件名是否与模式完全匹配。 The "?"s represent dates, so it can be in the format of YYYYMMDD, or YYYY-MM-DD, and I don't want to get the dates. “?”代表日期,因此可以采用YYYYMMDD或YYYY-MM-DD的格式,但我不想获取日期。 I guess for now, I will just try to get the letter portion before or after the date portion based on the pattern. 我想现在,我将根据模式在日期部分之前或之后尝试获取字母部分。

For example, if the file name pattern and the actual file name are: 例如,如果文件名模式和实际文件名是:

 *_???????? and file name: ab_cd_20160505_efg.txt

I want to grep the string ab_cd . 我想grep字符串ab_cd efg is skipped because it's not part of the pattern. efg被跳过,因为它不是模式的一部分。

If the file pattern and the actual file name are: 如果文件模式和实际文件名是:

 ????-??-??_* and file name: 2016-05-05_abc_def-ghi.csv

(contain both dash and undercore), I want to grep the string abc_def-ghi . (包含破折号和下划线),我想grep字符串abc_def-ghi The .csv is skipped because we don't care about the file extension, that's why we didn't give .csv in the pattern. .csv被跳过是因为我们不在乎文件扩展名,这就是为什么我们没有在模式中提供.csv的原因。

So, can someone let me know how to accomplish these using grep or sed or other command in shell script? 因此,有人可以让我知道如何使用grep或sed或其他shell脚本命令来完成这些工作吗?

a two step approach 两步走法

$ pattern=$(sed 's/*/([^0-9.]+)/;s/?/[0-9]/g' <<< '*_????????');
$ sed -r "s/$pattern.*/\1/" <<< 'ab_cd_12345678_efg.txt'
ab_cd

$ pattern=$(sed 's/*/([^0-9.]+)/;s/?/[0-9]/g' <<< '????-??-??_*');
$ sed -r "s/$pattern.*/\1/" <<< '1234-56-78_abc_def-ghi.csv'
abc_def-ghi

note the double quotes in the second sed command to let bash expand the pattern. 注意第二个sed命令中的双引号使bash扩展了模式。

This does pretty much the same as karakfa's answer , but in Bash: 这与karakfa的答案几乎相同,但在Bash中:

extract () {
    local pattern="$1"
    local fname="$2"
    pattern="${pattern//\?/[[:digit:]]}"
    pattern="${pattern/\*/([^[:digit:].]+)}"
    [[ $fname =~ $pattern ]]
    echo "${BASH_REMATCH[1]}"
}   

It uses parameter expansion to build a regex pattern by replacing all the ? 它使用参数扩展通过替换所有?来构建正则表达式模式? and * , then matches the filename against that pattern and the printing the first capture group. * ,然后将文件名与该模式匹配并打印第一个捕获组。

For example, the regex generated from *_???????? 例如,从*_????????生成的正则表达式 looks like 看起来像

([^[:digit:].]+)_[[:digit:]][[:digit:]][[:digit:]][[:digit:]][[:digit:]][[:digit:]][[:digit:]][[:digit:]]

The function can be used like this: 该函数可以这样使用:

$ extract '*_????????' 'ab_cd_20160505_efg.txt'                                                                     
ab_cd
$ extract '????-??-??_*' '2016-05-05_abc_def-ghi.csv'
abc_def-ghi

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM