在Bash中使用数组和Sed和Awk解析文本文件

Question

I have what appears to me a complex text file that include around 300 entries. 我有一个复杂的文本文件，包括大约300个条目。 I have no idea how to go about parsing this file to get the output I want. 我不知道如何解析这个文件来获得我想要的输出。 Each of my network users has an entry in the file. 我的每个网络用户都在该文件中有一个条目。 So in the text file, each user name starts with: 因此，在文本文件中，每个用户名都以：

USER:martha
USER:Othello
USER:darwin

Underneath each user entry in the file there are a host of information I require , but one user can have one entry and another can have multiple entries. 在文件中的每个用户条目下面都有我需要的大量信息，但是一个用户可以有一个条目而另一个用户可以有多个条目。 Here is the example of 3 such entries 以下是3个此类条目的示例

USER:martha
    POSITION: 170.198.82.13 [VLT(304394),PT(FULL)]
            CLIENT: jcrm19.1.p2ps -258-
            ACCESSPOINT: 170.198.82.13/net
            APPLICATION: 91

USER:othello 
    POSITION: 170.198.80.212 [VLT(307571),PT(FULL)]
            CLIENT: jcrm15.1.p2ps -258-
            ACCESSPOINT: 170.198.80.212/net
            APPLICATION: 256

            CLIENT: jcrm15.1.p2ps -258-
            ACCESSPOINT: 170.198.80.212/net
            APPLICATION: 256

    POSITION: 170.198.80.209 [VLT(306561),PT(FULL)]
            CLIENT: jcrm14.1.p2ps -258-
            ACCESSPOINT: 170.198.80.209/net
            APPLICATION: 256

            CLIENT: pwrm14.1.p2ps -258-
            ACCESSPOINT: 170.198.80.209/net
            APPLICATION: 256

            CLIENT: pwrm14.1.p2ps -258-
            ACCESSPOINT: 170.198.80.209/net
            APPLICATION: 256


USER:darwin
    POSITION: 170.198.19.102 [VLT(297987),PT(FULL)]
            CLIENT: jcrm16.1.p2ps -258-
            ACCESSPOINT: 170.198.19.102/net
            APPLICATION: 91

The final output should look as follow: 最终输出应如下所示：

USER        Position           Client     Application 

Martha      170.198.82.13       jcrm19      91
Othello     170.198.80.212      jcrm15      256
Othello     170.198.80.209      jcrm14      256
Martin      170.198.19.102      jcrm16      91

I have some experience with arrays and I could grep out some of the information and assign to variable and print them. 我有一些数组的经验，我可以grep一些信息并分配给变量并打印它们。 But I just don't know how to read the information into the arrays as the entries under each "USER" since they are of different length and content. 但我只是不知道如何将信息作为每个“USER”下的条目读入数组，因为它们具有不同的长度和内容。

So How do I read USER: martha and then jump to user:othello ? 那么如何阅读USER：martha然后跳转到用户：othello ？ Also, under user:othello there are two "Positions" that I need to grab. 此外，在用户：othello下，我需要抓住两个“位置” 。 I just don't know how to put the content I'm looking for into array variables or regular variables. 我只是不知道如何将我正在寻找的内容放入数组变量或常规变量中。 I never had to parse a file that had different length and content data for each use. 我从来没有必要为每次使用解析具有不同长度和内容数据的文件。 Not sure how many lines I have to read before I start reading and assigning values to array or values for the next user> Can you provide some hints or perhaps a piece of code that I can start with ? 在我开始阅读并为下一个用户分配数组或值之前，我不确定需要阅读多少行>你能提供一些提示或者我可以开始的一段代码吗？

Thanks 谢谢

Answer 1

Using awk with column : 使用带column awk ：

awk -F '[: ]+' 'BEGIN{print "USER", "Position", "Client", "Application"} 
  $1=="USER"{u=$2} $2=="POSITION"{p=$3}$2=="CLIENT"{c=$3}
  $2=="APPLICATION"&&p{print u, p, c, $3; p=""}' file | column -t

USER     Position        Client         Application
martha   170.198.82.13   jcrm19.1.p2ps  91
othello  170.198.80.212  jcrm15.1.p2ps  256
othello  170.198.80.209  jcrm14.1.p2ps  256
darwin   170.198.19.102  jcrm16.1.p2ps  91

Answer 2

我没有拿到我的Mac，所以这是未经测试的......

awk -F: '/^USER:/{u=$2} /POSITION:/{p=$2} /CLIENT:/{c=$2} /APPLICATION:/{print u,p,c,$2}' yourfile

Answer 3

awk -v RS="" -F'[:\n ]*' '/^USER/{u=$2}
 /POSI/{p=/^USER/?$4:$3
 for(i=1;i<=NF;i++)
     if($i=="CLIENT"){sub(/\..*/,"",$(i+1))
                      print u,p,$(i+1),$NF;break}}' file

the output without header: 没有标题的输出：

martha 170.198.82.13 jcrm19 91
othello 170.198.80.212 jcrm15 256
othello 170.198.80.209 jcrm14 256
darwin 170.198.19.102 jcrm16 91

you could add header and pipe to column -t to gain better format 您可以将标题和管道添加到column -t以获得更好的格式

在Bash中使用数组和Sed和Awk解析文本文件

问题描述

3 个解决方案

解决方案1
2 2014-05-13 20:27:18

解决方案2
0 已采纳 2014-05-13 20:30:58

解决方案3
0 2014-05-13 20:43:16

在Bash中使用数组和Sed和Awk解析文本文件

问题描述

3 个解决方案

解决方案1 2 2014-05-13 20:27:18

解决方案2 0 已采纳 2014-05-13 20:30:58

解决方案3 0 2014-05-13 20:43:16

解决方案1
2 2014-05-13 20:27:18

解决方案2
0 已采纳 2014-05-13 20:30:58

解决方案3
0 2014-05-13 20:43:16