简体   繁体   English

用bash组合多组文本文件

[英]Combining multiple groups of text files with bash

I have a folder of text files that are labelled something like this: 我有一个文本文件文件夹,其标签为:

0filename1
1filename1
2filename1
....
0filename2
1filename2
2filename2
....

et cetera. 等等。 What I want to do is take all the files that end in filename1 and combine them all into a file named filename1, and similarly for filename2 and all other files. 我想要做的是将所有以filename1结尾的文件合并到一个名为filename1的文件中,对于filename2和所有其他文件也是如此。 Normally I would do something like this 通常我会做这样的事情

cat [0123456789]*filename1 > filename1

and just repeat the command for every different file name I have. 然后为我拥有的每个不同文件名重复该命令。 However, I want to be able to automate this. 但是,我希望能够实现此自动化。 The exact form of the file names change regularly, so it's not as simple as just writing a script that will do the above command for filename1, filename2, etc. The length of the file names do stay constant though, so I suspect the right way to automate this would be for a script to take every file that has the same last n characters in the filename and copy them into a file with the name of these n characters. 文件名的确切形式会定期更改,因此它不像编写脚本来对filename1,filename2等执行上述命令那样简单。但是文件名的长度确实保持不变,因此我怀疑正确的方法要使之自动化,脚本将采用每个文件名中最后n个字符相同的文件,然后将其复制到文件中这n个字符的名称中。 I'm not sure how to do this though - any suggestions? 我不确定如何执行此操作-有什么建议吗?

Sounds pretty simple, just need to filter the files to get the 'base' strings. 听起来很简单,只需要过滤文件即可获得“基本”字符串。

for $base in $( ls | cut -b 1-8 | sort -u ); do
    cat [0123456789]*$base > $base
done

where 1-8 is the characters you intend to keep , so <consistent length of filenames> - <N last characters that vary instead of 8 . 其中1-8是您要保留的字符,因此<文件名的一致长度>-<N个最后一个字符,而不是8

Bit more complex solution that handles files with whitespace characters, with multi-digit numbers or flexible filename length: 复杂的解决方案,用于处理带有空格字符,多位数数字或灵活的文件名长度的文件:

#!/usr/bin/env bash

shopt -s extglob nullglob

files=(+([0-9])?*)
(( ${#files[@]} )) || exit 1

while IFS= read -rd '' filename; do
    printf '%s\0' +([0-9])"$filename" | sort -zn | xargs -0 cat > "$filename"
done < <(printf '%s\0' "${files[@]##+([0-9])}" | sort -zu)
#!/bin/bash

str="filename"
for i in {1..2}
do
    cat {?,??}"${str}${i}" > "${str}${i}"
done

Script uses Bash Expansion {..} and character wildcard ? 脚本使用Bash扩展{..}和字符通配符? to expand the available filenames. 扩展可用的文件名。 If you have 0filename1 to 9filename1 , then use a single ? 如果您有0filename19filename1 ,则使用一个? and use ?? 并使用?? for 10filename1-99filename1 . 对于10filename1-99filename1

Example: 例:

$ cat 0filename1 
011
$ cat 1filename1 
111
$ cat 2filename1 
211
$ cat 10filename1 
1011
$ cat 0filename2
022
$ cat 1filename2
122
$ cat 2filename2
222
$ cat 10filename2
1022

Output of the above script would be: 上面脚本的输出为:

$ cat filename1
011
111
211
1011

$ cat filename2
022
122
222
1022

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM