简体   繁体   English

如何使用awk读取每n个字符而不是每行的文件?

[英]How to read a file each n characters instead of each line using awk?

This is the content of file.txt : 这是file.txt的内容:

hello bro
my nam§
is Jhon Does

The file could also contain non-printable characters (for example \\x00, or \\x02) , and, as you can see, the lenght of the lines are not the same. 该文件还可以包含不可打印的字符(例如\\ x00或\\ x02) ,并且,如您所见,行的长度不相同。

Then I want to read it each each 5 characters without having into a count line breaks. 然后,我想每5个字符读取一次,而不必换行。 I thought in something like this using awk: 我想用awk这样的事情:

awk -v RS='' '{
  s=s $0;
}END{
  n=length(s);

  for(x=1; x<n; x=x+5){
    # Here I will put some calcs and stuff

    i++;
    print "line " i ": #" substr(s,x,5) "#"
  }
}' file.txt

The output is the following: 输出如下:

line 1: #hello#
line 2: # bro
#
line 3: #my na#
line 4: #m§
is#
line 5: # Jhon#
line 6: # Does#

It works perfectly, but the input file will be very large, so the performance is important. 它可以完美运行,但是输入文件将非常大,因此性能很重要。

In short, I'm looking for something like this: 简而言之,我正在寻找这样的东西:

awk -v RS='.{5}' '{ # Here I will put some calcs and stuff }'

But it doesn't works. 但这是行不通的。

Another alternative that works ok: 另一个可行的选择:

xxd -ps mifile.txt | tr -d '\n' | fold -w 10 | awk '{print "23" $0 "230a"}' | xxd -ps -r

Do you have any idea or alternative? 你有什么想法或选择吗? Thank you. 谢谢。

You can use perl and binmode assuming you are using normal characters. 假设您使用的是普通字符,则可以使用perl和binmode。

use strict;
use warnings;

open my $fh, '<', 'test'; 
#open the file.
binmode $fh;
# Set to binary mode
$/ = \5;
#Read a record as 5 bytes

while(<$fh>){
#Read records
        print "$_#"
        #Do whatever calculations you want here
}

For extended character sets you can use UTF8 and read every 5 characters instead of bytes. 对于扩展字符集,可以使用UTF8并每5个字符而不是字节读取一次。

use strict;
use warnings;

open my $fh, '<:utf8', 'test';
#open file in utf8.
binmode(STDOUT, ":utf8");
# Set stdout to utf8 as well

while ((read($fh, my $data, 5)) != 0){
#Read 5 characters into variable data
    print "$data#";
    #Do whatever you want with data here
}

If you are okay with Python , You may try this 如果您对Python没问题 ,可以尝试一下

f = open('filename', 'r+')
w = f.read(5)
while(w != ''):
        print w;
        w = f.read(5);
f.close()

So you asked How to read a file each n characters instead of each line using awk . 因此,您问如何使用awk而不是每行读取n个字符的文件。

Solution : 解决方案

If you have a modern gawk implementation use FPAT 如果您有现代的gawk实现,请使用FPAT

Normally, when using FS, gawk defines the fields as the parts of the record that occur in between each field separator. 通常,当使用FS时,gawk会将字段定义为记录的一部分,出现在每个字段分隔符之间。 In other words, FS defines what a field is not, instead of what a field is. 换句话说,FS定义了什么不是字段,而不是什么字段。 However, there are times when you really want to define the fields by what they are , and not by what they are not. 但是, 有时您确实想根据字段的定义而不是不是字段的定义

Code: 码:

gawk 'BEGIN{FS="\n";RS="";FPAT=".{,5}"}
            {for (i=1;i<=NF;i++){
               printf("$%d = <%s>\n", i, $i)}
            }' file

Check the demo 检查演示

I'm not sure I understand what you want but this outputs the same as the script in your question that you say works perfectly so hopefully this is it: 我不确定我是否了解您想要的内容,但这与您问题中的脚本的输出相同,您说的很完美,因此希望是这样:

$ awk -v RS='.{5}' 'RT!=""{ print "line", NR ": #" RT "#" }' file
line 1: #hello#
line 2: # bro
#
line 3: #my na#
line 4: #m§
is#
line 5: # Jhon#
line 6: # Does#

The above uses GNU awk for multi-char RS and RT. 上面使用GNU awk进行多字符RS和RT。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何获取unix数据文件中每行的前n个字符 - How to get first n characters of each line in unix data file 如何使用AWK或SED为文件中的每一行生成UUID? - How to generate a UUID for each line in a file using AWK or SED? GREP或AWK:搜索每行的前N个字符,并输出与模式匹配的周围的行 - GREP or AWK: Search in the first N characters of each line, and output surrounding lines that match pattern 使用 bash 打印每行的最后 n 个字符 - Print the last n characters of each line using bash 如何使用sed或awk将两个文件的每一行组合在一起? - How to combine each line of two files together using sed or awk? 导出文件的每一行(最后一行除外)以使用 AWK 为每一行创建一个新文件 - Export every line (excpet last line ) of a file to create a new file for each line using AWK 如何使用 python 脚本删除文件每行中“==”后的所有字符并更新文件? - How to delete all characters after a “==” in each line of a file and update the file, using python script? VI如何替换每行前n个字符后出现的n个字符 - VI how to replace the n character appearing after the first n characters in each line AWK打印每个文件 - Awk to print each file 在linux中替换文件的每一行中的字符 - Replacing characters in each line on a file in linux
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM