简体   繁体   English

awk 跳过记录。 获取线命令

[英]awk skipping records. getline command

this is a task related to data compression using fibonacci binary representation.这是一项与使用斐波那契二进制表示的数据压缩相关的任务。

what i have is this text file:我所拥有的是这个文本文件:

result.txt结果.txt

a 20
b 18
c 18
d 15
e 7

this file is a result of scanning a text file and counting the appearances of each char on the file using awk.该文件是扫描文本文件并使用 awk 计算文件中每个字符的出现次数的结果。

now i need to give each char its fibonacci-binary representation length.现在我需要给每个字符它的斐波那契二进制表示长度。 since i'm new to ubuntu and teminal, i've done a program in java that receives a number and prints all the fibonacci codewords length up to the number and it's working.因为我是 ubuntu 和终端的新手,所以我在 java 中完成了一个程序,它接收一个数字并打印所有斐波那契码字长度直到该数字,它正在工作。 this is exactly what i'm trying to do here.这正是我在这里想要做的。 the problem is that it doesn't work... the length of fibonacci codewords is also work as fibonnaci.问题是它不起作用......斐波那契码字的长度也可以作为斐波那契。 these are the rules:这些是规则:

  • f(1)=1 - there is 1 codeword of length 1. f(1)=1 - 有 1 个长度为 1 的码字。
  • f(2)=1 - there is 1 codeword of length 2. f(2)=1 - 有 1 个长度为 2 的码字。
  • f(3)=2 - there is 2 codeword of length 3. f(3)=2 - 有 2 个长度为 3 的码字。
  • f(4)=3 - there is 3 codeword of length 4. f(4)=3 - 有 3 个长度为 4 的码字。

and so on... (i'm adding on more bit to each codeword so the first two lengths will be 2 and 3)依此类推...(我在每个码字中添加更多位,因此前两个长度将是 2 和 3)

this is the code i've made: its name is scr5这是我制作的代码:它的名字是 scr5

{
a=1;
b=1;
len=2

print  $1 , $2, len;
getline;
print   $1 ,$2, len+1;
getline;
len=4;

for(i=1; i< num; i++){
    c= a+b;
    g=c;
    while (c >= 1){
        print $1 ,$2, len ;
        if (getline<=0){
            print "EOF"
            exit;
        }
        c--;
        i++;
    }
    a=b;
    b=c;
    len++;
}}

now i write on terminal:现在我在终端上写:

n=5
awk -v num=$n -f scr5 a

and there are two problems: 1. it skips the third letter c.并且有两个问题:1.它跳过了第三个字母c。 2. on the forth letter d, it prints the length of the first letter, 2, instead of length 3. 2. 在第四个字母 d 上,它打印第一个字母的长度 2,而不是长度 3。

i guess that there is a problem in the getline command.我猜 getline 命令有问题。

thank u very much!非常感谢!

Search Google for getline and awk and you'll mostly find reasons to avoid getline completely!在 Google 上搜索getlineawk ,您会找到完全避免使用 getline 的理由! Often it's a sign you're not really doing things the "awk" way.通常这表明你并没有真正以“awk”的方式做事。 Find an awk tutorial and work through the basics and I'm sure you'll see quickly why your attempt using getlines is not getting you off in the right direction.查找 awk 教程并学习基础知识,我相信您会很快明白为什么您尝试使用 getlines 并没有让您朝着正确的方向前进。

In the script below, the BEGIN block is run once at the beginning before any input is read, and then the next block is automatically run once for each line of input --- without any need for getline.在下面的脚本中, BEGIN块在开始时在读取任何输入之前运行一次,然后下一个块会为每一行输入自动运行一次 --- 无需任何 getline。

Good luck!祝你好运!

$ cat fib.awk
BEGIN { prior_count = 0; count = 1; len = 1; remaining = count; }

{ 
    if (remaining == 0) {
        temp = count;
        count += prior_count;
        prior_count  = temp;
        remaining = count;
        ++len;
    }

    print $1, $2, len;
    --remaining;
}

$ cat fib.txt
a 20
b 18
c 18
d 15
e 7
f 0
g 0
h 0
i 0
j 0
k 0
l 0
m 0

$ awk -f fib.awk fib.txt
a 20 1
b 18 2
c 18 3
d 15 3
e 7 4
f 0 4
g 0 4
h 0 5
i 0 5
j 0 5
k 0 5
l 0 5
m 0 6

The above solution, compressed form :上述解决方案,压缩形式:

mawk 'BEGIN{ ___= __=  _^=____=+_  }  !_ { __+=(\
            ____=___+_*(_=___+=____))^!_ } $++NF = (_--<_)+__' fib.txt

a 20 1
b 18 2
c 18 3
d 15 3
e 7 4
f 0 4
g 0 4
h 0 5
i 0 5
j 0 5
k 0 5
l 0 5
m 0 6

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM