简体   繁体   English

在bash中提取一部分文件名

[英]Extract part of a file name in bash

I have a folder with lots of files having a pattern, which is some string followed by a date and time: 我有一个包含很多文件的文件夹,该文件有一个模式,该模式是一些字符串,后跟日期和时间:

BOS_CRM_SUS_20130101_10-00-10.csv (3 strings before date)
SEL_DMD_20141224_10-00-11.csv (2 strings before date)
SEL_DMD_SOUS_20141224_10-00-10.csv (3 strings before date)

I want to loop through the folder and extract only the part before the date and output into a file. 我想遍历文件夹,仅提取日期之前的部分并输出到文件中。

Output
BOS_CRM_SUS_
SEL_DMD_
SEL_DMD_SOUS_

This is my script but it is not working 这是我的脚本,但是不起作用

#!/bin/bash

# script variables
FOLDER=/app/list/l088app5304d1/socles/Data/LEMREC/infa_shared/Shell/Check_Header_T24/

LOG_FILE=/app/list/l088app5304d1/socles/Data/LEMREC/infa_shared/Shell/Check_Header_T24/log

echo "Starting the programme at:  $(date)" >> $LOG_FILE

# Getting part of the file name from FOLDER
for file in `ls $FOLDER/*.csv`
do
    mv "${file}" "${file/date +%Y%m%d HH:MM:SS}" 2>&1 | tee -a $LOG_FILE
done #> $LOG_FILE

Assuming you wont have numbers in the first part, you could use: 假设第一部分没有数字,则可以使用:

$ for i in *csv;do  str=$(echo $i|sed -r 's/[0-9]+.*//'); echo $str; done
BOS_CRM_SUS_
SEL_DMD_
SEL_DMD_SOUS_

Or with parameter substitution : 或使用参数替换

$ for i in *csv;do echo ${i%_*_*}_; done
BOS_CRM_SUS_
SEL_DMD_
SEL_DMD_SOUS_

When you use ${var/pattern/replace} , the pattern must be a filename glob, not command to execute. 当您使用${var/pattern/replace} ,该pattern必须是文件名glob,而不是要执行的命令。

Instead of using the substitution operator, use the pattern removal operator 代替使用替换运算符,而使用模式删除运算符

mv "${file}" "${file%_*-*-*.csv}.csv"

% finds the shortest match of the pattern at the end of the variable, so this pattern will just match the date and time part of the filename. %在变量末尾找到该模式的最短匹配项,因此该模式将仅匹配文件名的日期和时间部分。

Use sed with extended-regex and groups to achieve this. sed与extended-regex和组配合使用可实现此目的。

cat filelist | sed -r 's/(.*)[0-9]{8}_[0-9][0-9]-[0-9][0-9].[0-9][0-9].csv/\1/'

where filelist is a file with all the names you care about. 其中filelist是具有您关心的所有名称的文件。 Of course, this is just a placeholder because I don't know how you are going to list all eligible files. 当然,这只是一个占位符,因为我不知道您将如何列出所有合格文件。 If a glob will do, for example, you can do 举例来说,如果某事能解决,您可以

ls mydir/*.csv | sed -r 's/(.*)[0-9]{8}_[0-9][0-9]-[0-9][0-9].[0-9][0-9].csv/\1/'

The substitution: 替代:

"${file/date +%Y%m%d HH:MM:SS}"

is unlikely to do anything, because it doesn't execute date +%Y%m%d HH:MM:SS . 不太可能做任何事情,因为它不执行date +%Y%m%d HH:MM:SS It just treats it as a pattern to search for, and it's not going to be found. 它只是将其视为搜索的模式,因此不会被找到。

If you did execute the command, though, you would get the current date and time, which is also (apparently) not what you find in the filename. 但是,如果确实执行了该命令,则将获得当前的日期和时间,(显然)也不是在文件名中找到的日期和时间。

If that pattern is precise, then you can do the following: 如果该模式是精确的,则可以执行以下操作:

echo "${file%????????_??-??-??.csv}" >> "$LOG_FILE"

using grep : 使用grep

ls *.csv | grep -Po "\K^([A-Za-z]+_)+"

output: 输出:

BOS_CRM_SUS_
SEL_DMD_
SEL_DMD_SOUS_

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM