简体   繁体   English

使用简单的shell脚本进行xml解析

[英]xml parsing with simple shell scripting

can some one please help me on getting xml data into shell scripting 有人可以帮我把XML数据放入外壳脚本中吗

here is my requirement. 这是我的要求。

I need to print CHILD value along with attribute value of CHILD and parent if the CHILD value is greater than 100 如果CHILD值大于100,我需要打印CHILD值以及CHILD的属性值和父级

here is my data 这是我的数据

<mydata>
    <parent detail="school1">
        <CHILD attribute="0">0</CHILD>
        <CHILD attribute="1">1932</CHILD>
        <CHILD attribute="2">0</CHILD>
        <CHILD attribute="3">500</CHILD>
        <CHILD attribute="4">0</CHILD>
        <CHILD attribute="5">0</CHILD>
        <CHILD attribute="6">7819</CHILD>
        <CHILD attribute="7">0</CHILD>
        <CHILD attribute="8">299</CHILD>
        <CHILD attribute="9">0</CHILD>
    </parent>
    <parent detail="school2">
        <CHILD attribute="0">1</CHILD>
        <CHILD attribute="1">7000</CHILD>
        <CHILD attribute="2">0</CHILD>
        <CHILD attribute="3">0</CHILD>
        <CHILD attribute="4">600</CHILD>
        <CHILD attribute="5">0</CHILD>
        <CHILD attribute="6">11674</CHILD>
        <CHILD attribute="7">0</CHILD>
        <CHILD attribute="8">489</CHILD>
        <CHILD attribute="9">0</CHILD>
    </parent>
</mydata>

my external file values are like this childvalue_limits.txt file 我的外部文件值类似于此childvalue_limits.txt文件

attribute0=100
attribute1=60
attribute3=80
attribute4=90
attribute5=100
attribute6=90
attribute7=50
attribute8=80
attribute9=70

I need to pass this file as argument to script and to take these values dynamically into the condition.. 我需要将此文件作为脚本的参数传递,并将这些值动态添加到条件中。

current code 当前代码

sed 's|><|>\n<|g' $WORKING_PATH/xml_detail.log | awk -F'"|<|>' '/parent detail/{p=$3} /CHILD attribute/{att=$3;val=$5;if(val>100)print  "child value on " p, "attribute "att,"is at value: "val ,"\n"}' 

current output 电流输出

child value on school2 attribute 1 is at value 1000
child value on school2 attribute 4 is at value 600
.....
.....

required output should be like this 所需的输出应该是这样的

child value on school2 attribute 1 is at value 1000 and threshold is 60
child value on school2 attribute 4 is at value 600 and threshold is 90
.....
.....

please note: threshold value is the dynamic value passed to if condition through a separate file called childvalue_limits.txt 请注意:阈值是通过名为childvalue_limits.txt的单独文件传递给if条件的动态值

You can not (correctly) parse XML using regular expression. 您不能(正确)使用正则表达式解析XML。 XML is a context-free language, which is more expressive than a grammar based on regular expressions. XML是一种无上下文的语言,比基于正则表达式的语法更具表现力。 See the Chomsky hierarchy for details. 有关详细信息,请参见Chomsky层次结构。 That is also the reason why you run into troubles with newlines when using regular expressions. 这也是使用正则表达式时遇到换行问题的原因。

Hence, it is better (and easier and more stable) to use a proper XML parser. 因此,使用适当的XML解析器更好(更容易,更稳定)。 As I am most familiar with BaseX (full disclousure: I am also associated with the project) I will use it. 因为我对BaseX最熟悉(完全公开:我也与该项目相关联),所以我将使用它。

When using the zip version, you can simple run the file bin/basex . 使用zip版本时,您可以简单地运行文件bin/basex The following XPath 3.0 expression should give you the correct output, simply concatenating the different values: 以下XPath 3.0表达式应该为您提供正确的输出,只需将不同的值连接起来即可:

for $c in /mydata/parent/CHILD[. > 100] return $c/parent::parent/@detail || " " || $c/@attribute || " " || $c/data() || "&#10;"

Assuming your xml file is named mydata.xml you can execute this XPath simply by issueing the following command (ie this can be done in your shell script): 假设您的xml文件名为mydata.xml ,则只需mydata.xml以下命令即可执行此XPath(即,可以在您的Shell脚本中完成此操作):

basex -i mydata.xml -q 'for $c in /mydata/parent/CHILD[. > 100] return $c/parent::parent/@detail || " " || $c/@attribute || " " || $c/data() || "&#10;"'

EDITED AGAIN 再次编辑

Ok, I have changed the code to read a file of input limits. 好的,我已经更改了代码以读取输入限制的文件。 It looks complicated but it is is not - you can remove all the lines that have the word "DEBUG" in them if you want to. 它看起来很复杂,但事实并非如此-如果需要,您可以删除所有带有单词“ DEBUG”的行。 The # is the start of a comment. #是注释的开始。

#!/bin/bash

awk -F'"|<|>' '
   FNR==NR           {
                       split($0,f,"=");  # Split line on "=" sign into array f[]
                       gsub(/[[:alpha:]]/,"",f[1]); # Remove non-digits
                       limits[f[1]]=f[2]; # Save for comparison later
                       print "DEBUG: limits[",f[1],"]=",f[2];
                       next
                     }
   /parent detail/   {
                       p=$3
                       print "DEBUG: parent detail=",p;
                     }
   /CHILD attribute/ {
                       att=$3;val=$5;
                       print "DEBUG: att=",att,",val=",val; 
                       if(val>limits[att])print p,att,val,limits[att]
                     }
   ' limits.txt xml

You will see at the end of the script that it reads in BOTH your files - limits.txt and xml . 您将在脚本的结尾看到脚本同时读取了文件limits.txtxml In the script, the block in curly braces that starts FNR==NR means that the following code only applies to reading and parsing limits.txt . 在脚本中,以大括号开头的块(以FNR==NR开头)意味着以下代码仅适用于读取和解析limits.txt

If you want to see the output without DEBUG messages, just run 如果要查看不带DEBUG消息的输出,只需运行

./script | grep -v DEBUG

EDITED EDITED

Your code works fine for me with your revised data. 您的代码对我修改后的数据适用。 Here is my output: 这是我的输出:

node2 1 1932
node2 6 7819
node1 1 1924
node1 6 11674

I assume you mean you want to avoid XML parsers and just use standard tools like awk and sed to achieve this, so I'll go with awk 我假设您是说要避免使用XML解析器,而仅使用awksed等标准工具来实现此目的,所以我将使用awk

awk -F'"|<|>' '/parent detail/{p=$3} /CHILD attribute/{att=$3;val=$5;if(val>100)print p,att,val}' xml

Output: 输出:

school1 1 1932
school1 3 500
school1 6 7819
school1 8 299
school2 1 7000
school2 4 600
school2 6 11674
school2 8 489

So, it sets the separator to any of " , < or > . Then, when it sees lines with the words "parent detail" it saves the value in p . When it sees lines with the words CHILD attribute it extracts the attribute and value. If the value is over 100, it prints the parent, attribute and value. 因此,它将分隔符设置为"<>任何一个。然后,当看到带有” parent detail“的行时,将值保存在p 。当看到带有CHILD attribute行时,将提取属性和值。 。如果该值超过100,它将打印父级,属性和值。

It assumes your XML is in a file called xml . 假设您的XML位于名为xml的文件中。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM