简体   繁体   English

根据bash中文件名中的子字符串复制最新的更新文件

[英]Copy the latest updated file based on substring from filename in bash

I have to archive some files (based on date which is there in file) from a folder but there can be multiple files with same name (substring). 我必须从一个文件夹中存档一些文件(基于文件中的日期),但是可以有多个具有相同名称(子字符串)的文件。 I have to copy only the latest one to a saperate folder. 我只需要将最新的文件复制到一个saperate文件夹中。

for eg.
20180730.abc.xyz2.jkl.20180729.164918.csv.gz

In this -> 20180730 and 20180729 are representing date from which I have to search by (first date) 20180730. This part is done. 在此-> 20180730和20180729表示我必须在(第一日期)20180730之前搜索的日期。这部分完成了。

The searching part which i wrote is : 我写的搜索部分是:

for FILE in $SOURCE_DIR/$BUSINESS_DT*
{
do
 # Here I have to search if this FILENAME exists and if yes, then copy that  latest file
 cp "${FILE}" $TARGET_DIR/
done

Now I have to search if the same SOURCE_DIR contains a file with the name similar to 20180730.abc.xyz2.jkl. 现在我必须搜索相同的SOURCE_DIR是否包含名称类似于20180730.abc.xyz2.jkl的文件。 and if it exists then I have to copy it. 如果存在,那么我必须复制它。 so basically, I have to extract the portion abc.xyz2.jkl. 所以基本上,我必须提取abc.xyz2.jkl部分。 I can't use cut with fields as the filename could either be like abc.xyz2.jkl or abc.xyz. 我不能对字段使用cut,因为文件名可能像abc.xyz2.jkl或abc.xyz。 The portion is variable and can also have numberthe last two numbers are also variable and can change. 该部分是可变的,也可以有数字,最后两个数字也是可变的,可以更改。 Some eg are: 例如:

20180730.abc.xyz2.jkl.20170729.890789.csv.gz
 20180730.abc.xyz2.20180729.121212.csv.gz
 20180730.ab.xy.20180729.11111.csv.gz

Can anybody please help me in doing that. 有人可以帮我吗? I tried find and cut but didn't got required results. 我尝试查找并剪切,但没有得到所需的结果。

Many Thanks 非常感谢

Python might be a better choice for implementing something like this, but here is a bash example. Python可能是实现此类目标的更好选择,但这是一个bash示例。 You can use sed positional parameter to extract the portion of the filename that you want. 您可以使用sed位置参数提取所需的文件名部分。 Then use an associative array to store the filename of the newest file containing the substring found. 然后,使用关联数组存储包含找到的子字符串的最新文件的文件名。 Once that's done, you can go back and do the copy operations. 完成后,您可以返回并执行复制操作。 Here is an example which extracts the string between the two 8-digit numbers and periods. 这是一个提取两个8位数字和句点之间的字符串的示例。 This sed expression may not work for your complete data set, but it works for the 3 examples you gave. 这个sed表达式可能不适用于您的完整数据集,但适用于您提供的3个示例。 Also this won't handle cases where one unique identifier is a subset of another unique identifier. 同样,这将无法处理一个唯一标识符是另一个唯一标识符的子集的情况。

declare -A LATEST
for FILE in $SOURCE_DIR/$BUSINESS_DT*
do
     # Extract the substring unique identifier
    HASH=$(echo "${FILE}" |  sed "s/[0-9]\{8\}\.\(.*\)\.[0-9]\{8\}.*$/\\1/g")

    # If this is the first time on this unique identifier,
    # then get the latest matching file
    if [ ${LATEST[${HASH}]}abc == abc ]
    then
        LATEST[${HASH}]=$(find . -type f -name '*${HASH}*' -printf '%T@ %p\n' | sort -n | tail -1 | cut -f2- -d" ")
    fi
done

for FILE in "${!LATEST[@]}"
do
    cp "${FILE}" $TARGET_DIR/
done

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Linux复制文件并重命名为文件名的子字符串 - Linux copy file and rename to substring of filename 使用BASH中的特殊字符从文件名中删除子字符串 - Batch remove substring from filename with special characters in BASH bash从列表中选择文件的最新版本 - bash choosing latest version of a file from a list bash - 根据从文件名中提取的信息移动文件 - bash - moving files based on information extracted from filename awk 命令根据 substring 分割文件名 - awk command to split filename based on substring 使用“.filename”从另一个文件中提取变量凭据的 Bash 脚本 - Bash script to pull variable credentials from another file using ". filename" 如何从文件中的某些文本中查找子字符串并将其存储在 bash 变量中? - How to find a substring from some text in a file and store it in a bash variable? 使用 shell(bash)-script 从文件中导出带有 substring 的字符串 - export string with substring from file with shell(bash)-script Bash从文件列表中获取时间戳并将其与文件名进行比较 - Bash to get timestamp from file list and compare it to filename 根据文件名上的 grep 获取到最新文件的自动更新符号链接 - Get an auto updated symbolic link to the latest file, based on grep on file name
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM