简体   繁体   English

连接特定数量的文件

[英]Concatenate specific number of files

I have a bunch of files named uv_set_XXXXXXXX where the 6 Xs stand for the usual format year, month and day. 我有一堆名为uv_set_XXXXXXXX的文件,其中6个X代表常规格式的年,月和日。 Imagine I have 325 files of this type. 想象一下,我有325个此类文件。 I would like to concatenate by groups of 50 files, so in the end I have 7 files (6 files of 50 and 1 of 25). 我想按50个文件的组进行连接,所以最后我有7个文件(6个文件,分别为50个和25个中的1个)。

I have been thinking in using cat but I can't see an option to select a number of files from a list. 我一直在考虑使用cat但看不到从列表中选择多个文件的选项。 I could do this with Python, but just wondering if some Unix command line utility does it more directly. 我可以用Python做到这一点,但是只是想知道是否某些Unix命令行实用程序可以更直接地做到这一点。

Thanks. 谢谢。

With GNU parallel you can use the following command 使用GNU parallel,可以使用以下命令

parallel -n50 "cat {} > out{#}" ::: uv_set_*

This will merge the first 50 files into out1 , the next 50 files into out2 , and so on. 这会将前50个文件合并到out1out1 50个文件合并到out2 ,依此类推。

I would just break down and do this in Awk. 我只是分解并在Awk中执行此操作。

awk 'FNR==1 && (++i%50 == 0) {
    if(NR>1) close p;
    p = "dest_" ++j }
    { print >p }' uv_set_????????

This creates files dest_1 through dest_7 , the first 6 with 50 files in each and the last with the remainder. 这将创建文件dest_1dest_7 ,前6个文件中每个文件包含50个文件,最后一个文件中包含其余文件。

Closing the previous file is necessary because the system only allows Awk to have a limited number of open file handles (though the limit is typically higher than 7 so it's probably not important in your example). 关闭前一个文件是必要的,因为系统仅允许Awk具有有限数量的打开文件句柄(尽管该限制通常大于7,因此在您的示例中可能并不重要)。


Thinking out loud dept, just to prevent anyone else from wasting time on repeating this dead end. 考虑大声部,只是为了防止其他人浪费时间重复这种死胡同。

You could use xargs -L 50 cat to concatenate 50 files at a time, but there is no simple way to pass in a new redirection for standard output for each invocation. 您可以使用xargs -L 50 cat连接50个文件,但是没有简单的方法为每次调用为标准输出传递新的重定向。 You could try to hack your way around that with something like 您可以尝试使用类似的方法来解决问题

# XXX Do not use: incomplete
printf '%s\n' uv_set_???????? |
xargs -L 50 sh -c 'cat "$@" > ... something' _

but I can't come up with elegant way to have a different something each time. 但我想不出优雅的方法来每次都有不同的something

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM