根据bash中文件名中的子字符串复制最新的更新文件

Question

I have to archive some files (based on date which is there in file) from a folder but there can be multiple files with same name (substring). 我必须从一个文件夹中存档一些文件（基于文件中的日期），但是可以有多个具有相同名称（子字符串）的文件。 I have to copy only the latest one to a saperate folder. 我只需要将最新的文件复制到一个saperate文件夹中。

for eg.
20180730.abc.xyz2.jkl.20180729.164918.csv.gz

In this -> 20180730 and 20180729 are representing date from which I have to search by (first date) 20180730. This part is done. 在此-> 20180730和20180729表示我必须在（第一日期）20180730之前搜索的日期。这部分完成了。

The searching part which i wrote is : 我写的搜索部分是：

for FILE in $SOURCE_DIR/$BUSINESS_DT*
{
do
 # Here I have to search if this FILENAME exists and if yes, then copy that  latest file
 cp "${FILE}" $TARGET_DIR/
done

Now I have to search if the same SOURCE_DIR contains a file with the name similar to 20180730.abc.xyz2.jkl. 现在我必须搜索相同的SOURCE_DIR是否包含名称类似于20180730.abc.xyz2.jkl的文件。 and if it exists then I have to copy it. 如果存在，那么我必须复制它。 so basically, I have to extract the portion abc.xyz2.jkl. 所以基本上，我必须提取abc.xyz2.jkl部分。 I can't use cut with fields as the filename could either be like abc.xyz2.jkl or abc.xyz. 我不能对字段使用cut，因为文件名可能像abc.xyz2.jkl或abc.xyz。 The portion is variable and can also have numberthe last two numbers are also variable and can change. 该部分是可变的，也可以有数字，最后两个数字也是可变的，可以更改。 Some eg are: 例如：

20180730.abc.xyz2.jkl.20170729.890789.csv.gz
 20180730.abc.xyz2.20180729.121212.csv.gz
 20180730.ab.xy.20180729.11111.csv.gz

Can anybody please help me in doing that. 有人可以帮我吗？ I tried find and cut but didn't got required results. 我尝试查找并剪切，但没有得到所需的结果。

Many Thanks 非常感谢

Answer 1

Python might be a better choice for implementing something like this, but here is a bash example. Python可能是实现此类目标的更好选择，但这是一个bash示例。 You can use sed positional parameter to extract the portion of the filename that you want. 您可以使用sed位置参数提取所需的文件名部分。 Then use an associative array to store the filename of the newest file containing the substring found. 然后，使用关联数组存储包含找到的子字符串的最新文件的文件名。 Once that's done, you can go back and do the copy operations. 完成后，您可以返回并执行复制操作。 Here is an example which extracts the string between the two 8-digit numbers and periods. 这是一个提取两个8位数字和句点之间的字符串的示例。 This sed expression may not work for your complete data set, but it works for the 3 examples you gave. 这个sed表达式可能不适用于您的完整数据集，但适用于您提供的3个示例。 Also this won't handle cases where one unique identifier is a subset of another unique identifier. 同样，这将无法处理一个唯一标识符是另一个唯一标识符的子集的情况。

declare -A LATEST
for FILE in $SOURCE_DIR/$BUSINESS_DT*
do
     # Extract the substring unique identifier
    HASH=$(echo "${FILE}" |  sed "s/[0-9]\{8\}\.\(.*\)\.[0-9]\{8\}.*$/\\1/g")

    # If this is the first time on this unique identifier,
    # then get the latest matching file
    if [ ${LATEST[${HASH}]}abc == abc ]
    then
        LATEST[${HASH}]=$(find . -type f -name '*${HASH}*' -printf '%T@ %p\n' | sort -n | tail -1 | cut -f2- -d" ")
    fi
done

for FILE in "${!LATEST[@]}"
do
    cp "${FILE}" $TARGET_DIR/
done

根据bash中文件名中的子字符串复制最新的更新文件

问题描述

1 个解决方案

解决方案1
0 2018-08-06 21:54:41

根据bash中文件名中的子字符串复制最新的更新文件

问题描述

1 个解决方案

解决方案1 0 2018-08-06 21:54:41

解决方案1
0 2018-08-06 21:54:41