简体   繁体   English

在Bash脚本中使用Gawk和Printf

[英]Using Gawk and Printf in a Bash script

I am trying to separate a file into smaller files with gawk and rename the smaller files in order from the original file. 我正在尝试使用gawk将文件分成较小的文件,然后按顺序从原始文件重命名较小的文件。

for i in *.txt 
do
gawk -v RS="START_of_LINE_to_SEPARATE" 'NF{ print RS$0 > "new_file_"++n".txt"}' $i
done

The output gives me: new_file_1.txt new_file_2.txt ect... 输出给我:new_file_1.txt new_file_2.txt等...

I would like the output to be: new_file_0001.txt new_file_0002.txt ect... 我希望输出为:new_file_0001.txt new_file_0002.txt ect ...

You can do: 你可以做:

for i in *.txt; do 
    printf -v num "%04d" $((++n))
    gawk -v num="$num" -v RS="START_of_LINE_to_SEPARATE" 'NF{
       print RS$0 > "new_file_" num ".txt"}' "$i"
done

Ignoring the issue of the outer loop and focusing on the awk part of the question, you can use sprintf to produce your filename: 忽略外部循环的问题,而专注于问题的awk部分,可以使用sprintf生成文件名:

gawk -v RS="START_of_LINE_to_SEPARATE" 'NF{ file = sprintf("new_file_%04d.txt", ++n) 
                                            print RS$0 > file }' "$i"

The format specifier %04d means that the number is a digit, padded to length 4 with leading zeros. 格式说明符%04d表示该数字是数字,并用前导零填充长度4。

If you want to go through all the .txt files and keep incrementing the counter, then you can get rid of the loop and pass them all to awk at once by changing "$i" to *.txt . 如果要遍历所有.txt文件并继续增加计数器,则可以摆脱循环,将"$i"更改为*.txt即可将它们全部传递给awk。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM