简体   繁体   中英

Get attribute value from xml tags with shell script and convert to csv

The task:

I'm trying to get the attribute-value from xml tags with a shell script, split the value up and save them in a .csv-file.

This is how the xml looks like:

<host>
  <servers>
    <server name="Type1Name1-Port1" >...</server>
    <server name="Type2Name2-Port2" >...</server>
    <server name="Type3Name3-Port3" >...</server>
    ...
    <server name="TypexNamex-Portx" >...</server>
  </servers>
</host>

I'd like to get the values from the "name"-attribute and split them up like following:
Type;Name;Port

The output csv file I want should look like this:

Type1;Name1;Port1
Type2;Name2;Port2
Type3;Name3;Port3
...
Typex;Namex;Portx

The problem:

  • I can't install anything on the server
  • I can only use "ksh-awk" / "xmllint wihtout --xpath" / "standard linux commands"

I can use any shell-language I want to. I prefer bash and ksh.

My questions:

  • Do you think it is possible to solve my task?
  • What is the best approach for the sub-tasks? (reading, splitting, writing)

EDIT:

Example data of a server-name:

T-TTT_AAA-A-SSS-PPPP

Where T represents the Type, A the Applicationname, S the Server-Name, P the Port. The length of T, A and S are variable. P is constant.

Here is what I came up with, using only common tools : xmllint and sed :

echo 'cat //host/servers/server/@name' | xmllint --shell data.xml | sed -n 's: name=\"\([A-Z][a-z0-9]*\)\([A-Z][a-z0-9]*\)-\(.*\)\":\1,\2,\3:p'

The sed part is done according to OP's examples at the moment of posting.

Breakdown:

  • echo 'cat //host/servers/server/@name' : we pass this command to xmllint . It will catch the name attribute of all the nodes inside <host><servers><server ...> ... </server></servers></hosts>
  • xmllint --shell data.xml : iterates through data.xml and executes the commands passed as argument in an interactive shell.
  • sed -n 's: name=\\"\\([AZ][a-z0-9]*\\)\\([AZ][a-z0-9]*\\)-\\(.*\\)\\":\\1;\\2;\\3:p' : we process the output of xmllint to only keep the data we are interested
    • xmllint will produce the following output : name="Type1Name1-Port1"
    • We define 3 capture groups : a capital letter followed by any character except capital (for Type ), another capital letter followed by any character except capital (for Name ), and any character between the - and " character
    • We tell sed to only print the matched strings, separated by semicolumns

Output :

Type1;Name1;Port1
Type2;Name2;Port2
Type3;Name3;Port3
Typex;Namex;Portx

EDIT:

To fit the pattern you indicated in the comments, you'll just have to change the sed regex, for instance :

sed -n 's: name=\"\(.*\)_\(.*\)-\(.\{4\}\)\":\1,\2,\3:p'

This will match the format T-TTT_AAA-A-SSS-PPPP , with any length for the type and server name. Try to fiddle around the regex or ask another question in the regex tag if this is not exactly what you need.

Without xmllint you can parse input like

<host>
  <servers>
    <server name="Type1_Name1-Port1" >...</server>
    <server name="Type-2_Name2-Port2" >...</server>
    <server name="Type3_Name-3-Port3" >...</server>
  </servers>
</host>

with

sed -n '/<server name=/ s/[^"]*"\([^_]*\)_\([^"]*\)-\([^"]*\)".*/\1;\2;\3/p' inputfile
xidel -e '//server/@name' f.xml |  sed ...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM