简体   繁体   中英

xml parsing with simple shell scripting

can some one please help me on getting xml data into shell scripting

here is my requirement.

I need to print CHILD value along with attribute value of CHILD and parent if the CHILD value is greater than 100

here is my data

<mydata>
    <parent detail="school1">
        <CHILD attribute="0">0</CHILD>
        <CHILD attribute="1">1932</CHILD>
        <CHILD attribute="2">0</CHILD>
        <CHILD attribute="3">500</CHILD>
        <CHILD attribute="4">0</CHILD>
        <CHILD attribute="5">0</CHILD>
        <CHILD attribute="6">7819</CHILD>
        <CHILD attribute="7">0</CHILD>
        <CHILD attribute="8">299</CHILD>
        <CHILD attribute="9">0</CHILD>
    </parent>
    <parent detail="school2">
        <CHILD attribute="0">1</CHILD>
        <CHILD attribute="1">7000</CHILD>
        <CHILD attribute="2">0</CHILD>
        <CHILD attribute="3">0</CHILD>
        <CHILD attribute="4">600</CHILD>
        <CHILD attribute="5">0</CHILD>
        <CHILD attribute="6">11674</CHILD>
        <CHILD attribute="7">0</CHILD>
        <CHILD attribute="8">489</CHILD>
        <CHILD attribute="9">0</CHILD>
    </parent>
</mydata>

my external file values are like this childvalue_limits.txt file

attribute0=100
attribute1=60
attribute3=80
attribute4=90
attribute5=100
attribute6=90
attribute7=50
attribute8=80
attribute9=70

I need to pass this file as argument to script and to take these values dynamically into the condition..

current code

sed 's|><|>\n<|g' $WORKING_PATH/xml_detail.log | awk -F'"|<|>' '/parent detail/{p=$3} /CHILD attribute/{att=$3;val=$5;if(val>100)print  "child value on " p, "attribute "att,"is at value: "val ,"\n"}' 

current output

child value on school2 attribute 1 is at value 1000
child value on school2 attribute 4 is at value 600
.....
.....

required output should be like this

child value on school2 attribute 1 is at value 1000 and threshold is 60
child value on school2 attribute 4 is at value 600 and threshold is 90
.....
.....

please note: threshold value is the dynamic value passed to if condition through a separate file called childvalue_limits.txt

You can not (correctly) parse XML using regular expression. XML is a context-free language, which is more expressive than a grammar based on regular expressions. See the Chomsky hierarchy for details. That is also the reason why you run into troubles with newlines when using regular expressions.

Hence, it is better (and easier and more stable) to use a proper XML parser. As I am most familiar with BaseX (full disclousure: I am also associated with the project) I will use it.

When using the zip version, you can simple run the file bin/basex . The following XPath 3.0 expression should give you the correct output, simply concatenating the different values:

for $c in /mydata/parent/CHILD[. > 100] return $c/parent::parent/@detail || " " || $c/@attribute || " " || $c/data() || "&#10;"

Assuming your xml file is named mydata.xml you can execute this XPath simply by issueing the following command (ie this can be done in your shell script):

basex -i mydata.xml -q 'for $c in /mydata/parent/CHILD[. > 100] return $c/parent::parent/@detail || " " || $c/@attribute || " " || $c/data() || "&#10;"'

EDITED AGAIN

Ok, I have changed the code to read a file of input limits. It looks complicated but it is is not - you can remove all the lines that have the word "DEBUG" in them if you want to. The # is the start of a comment.

#!/bin/bash

awk -F'"|<|>' '
   FNR==NR           {
                       split($0,f,"=");  # Split line on "=" sign into array f[]
                       gsub(/[[:alpha:]]/,"",f[1]); # Remove non-digits
                       limits[f[1]]=f[2]; # Save for comparison later
                       print "DEBUG: limits[",f[1],"]=",f[2];
                       next
                     }
   /parent detail/   {
                       p=$3
                       print "DEBUG: parent detail=",p;
                     }
   /CHILD attribute/ {
                       att=$3;val=$5;
                       print "DEBUG: att=",att,",val=",val; 
                       if(val>limits[att])print p,att,val,limits[att]
                     }
   ' limits.txt xml

You will see at the end of the script that it reads in BOTH your files - limits.txt and xml . In the script, the block in curly braces that starts FNR==NR means that the following code only applies to reading and parsing limits.txt .

If you want to see the output without DEBUG messages, just run

./script | grep -v DEBUG

EDITED

Your code works fine for me with your revised data. Here is my output:

node2 1 1932
node2 6 7819
node1 1 1924
node1 6 11674

I assume you mean you want to avoid XML parsers and just use standard tools like awk and sed to achieve this, so I'll go with awk

awk -F'"|<|>' '/parent detail/{p=$3} /CHILD attribute/{att=$3;val=$5;if(val>100)print p,att,val}' xml

Output:

school1 1 1932
school1 3 500
school1 6 7819
school1 8 299
school2 1 7000
school2 4 600
school2 6 11674
school2 8 489

So, it sets the separator to any of " , < or > . Then, when it sees lines with the words "parent detail" it saves the value in p . When it sees lines with the words CHILD attribute it extracts the attribute and value. If the value is over 100, it prints the parent, attribute and value.

It assumes your XML is in a file called xml .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM