简体   繁体   English

如何使用sed或awk正则表达式在linux shell中解析此数据

[英]How to use sed or awk regex to parse this data in linux shell

I have this data in my file 我的档案中有这些资料

 65 ---
 66 FieldType: Text
 67 FieldName: STATE
 68 FieldNameAlt: STATE
 69 FieldFlags: 4194304
 70 FieldJustification: Left
 71 FieldMaxLength: 2
 72 ---
 73 FieldType: Text
 74 FieldName: ZIP
 75 FieldNameAlt: ZIP
 76 FieldFlags: 0
 77 FieldJustification: Left
 78 ---
 79 FieldType: Signature
 80 FieldName: EMPLOYEE SIGNATURE
 81 FieldNameAlt: EMPLOYEE SIGNATURE
 82 FieldFlags: 0
 83 FieldJustification: Left
 84 ---
 85 FieldType: Text
 86 FieldName: Name_Last
 87 FieldNameAlt: LAST
 88 FieldFlags: 0
 89 FieldValue: Billa
 90 FieldJustification: Left
 91 ---

How can i make that a array and store as key value pair in array like 我如何制作一个数组并将其作为键值对存储在像这样的数组中

array['fieldtype']
array['fieldName']

for all the objects. 对于所有对象。

i think the separater is only "---" but i don't know how can i do that 我认为分隔符只是“-”,但我不知道该怎么做

Here's one way with GNU awk. 这是使用GNU awk的一种方法。 It splits the input into records which can then be worked on. 它将输入分为记录,然后可以对其进行处理。

parse.awk parse.awk

BEGIN {
  RS = " +[0-9]+ +---\n"
  FS = "\n"
}

{
  for(i=1; i<=NF; i++) {             # for each line
    sf = split($i, a, ":")
    if(sf > 1) {                     # only accept successfully split lines
      sub("^ +[0-9]+ +", "", a[1])   # trim key
      sub("^ +", "",  a[2])          # trim value
      array[a[1]] = a[2]             # save into array hash
    }
  }
}

{
  print "Record: " NR
  for(k in array) {
    print k " -> " array[k]
  }
  print ""
}

Save the above into parse.awk and run it like this: 将上面的内容保存到parse.awk中 ,然后像这样运行它:

awk -f parse.awk infile

Where infile contains the data you want to parse. 其中infile包含要解析的数据。 Output: 输出:

Record: 1

Record: 2
FieldFlags -> 4194304
FieldNameAlt -> STATE
FieldJustification -> Left
FieldType -> Text
FieldMaxLength -> 2
FieldName -> STATE

Record: 3
FieldFlags -> 0
FieldNameAlt -> ZIP
FieldJustification -> Left
FieldType -> Text
FieldMaxLength -> 2
FieldName -> ZIP

Record: 4
FieldFlags -> 0
FieldNameAlt -> EMPLOYEE SIGNATURE
FieldJustification -> Left
FieldType -> Signature
FieldMaxLength -> 2
FieldName -> EMPLOYEE SIGNATURE

Record: 5
FieldFlags -> 0
FieldNameAlt -> LAST
FieldJustification -> Left
FieldType -> Text
FieldMaxLength -> 2
FieldValue -> Billa
FieldName -> Name_Last

You can use something like this: 您可以使用如下形式:

sed -n '/FieldType/,/FieldName/{N};s/FieldType: \([^\n]*\)\nFieldName: \([^\n]*\)/a["\2"]=\1/gp' input >> tmp.sh

and do: 并做:

source tmp.sh

or use eval instead of redirection and source , however the space in the employee signature field will cause problems. 或使用eval代替重定向和source ,但是员工签名字段中的空格会引起问题。

Using Perl makes more sense though. 不过,使用Perl更有意义。

In any type of awk: 在任何awk中:

#!awk -F':[[:blank:]]*' -f
BEGIN {
    counter = 0
}
/:/ {
    array[counter,$1] = $2
}
/---/ {
    counter++;
}
END {
  # Deal with the array.
}

This creates an array where each cell counted off by 'counter' contains the fields as described above with array[x,key] = value. 这将创建一个数组,其中被“计数器”计数的每个单元格都包含上述array [x,key] = value的字段。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM