简体   繁体   English

正确的awk表达式将xml解析为csv

[英]correct Awk expression to parse xml to csv

i wrote a expression to parse my xml into CSV but it doesn't work, could do help me on it please. 我写了一个表达式来解析我的xml到CSV但是它不起作用,请帮我吧。 I do it this way because i can't use a parser like xmlstarlet on the server. 我是这样做的,因为我不能在服务器上使用像xmlstarlet这样的解析器。

Here is my xml 这是我的xml

<?xml version="1.0"?>
<root>
  <record>
   <country>US</country>  
  <data>
            <id_client>50C</id_client>  
            <mail>1@mail.com</mail>
            <adress>10  </adress>
            <num_tel>001</num_tel>
            <name>toto</name>
            <birth>01/30/008</birth>        
  </data> 
  <data>
            <id_client>100K</id_client>  
            <mail>2@mail.com</mail>
            <adress>10  </adress>
            <num_tel>002</num_tel>
            <name>toto2</name>
            <birth>01/30/011</birth>                    
  </data> 
 </ record>
 <record>
   <country>China</country>  
  <data>
            <id_client>99E</id_client>  
            <mail>3@mail.com</mail>
            <adress>10  </adress>
            <num_tel>003</num_tel>
            <name>toto3</name>
            <birth>01/30/0008</birth>       
  </data> 
  <data>
            <id_client>77B</id_client>  
            <mail>4@mail.com</mail>
            <adress>10  </adress>
            <num_tel>004</num_tel>
            <name>toto4</name>
            <birth>2001/05/01</birth>                   
  </data> 
  </record
  </root>

the output i need: 我需要的输出:

 country;id_client;name
 US;50C;toto1
 US;100K;toto2
 China;99E;toto3
 China77B;toto4

And finaly my syntax i'am trying to update: 最后我的语法我试图更新:

/<country>/{sub(".*<country[^>]+><[^>]+>","",$0);sub("<.*","",$0);s=s";"$0}/<\/country>/{sub("^;","",s);print s;s=""}

If you're data's always laid out one entry per line like you show with no wacky white space intervening: 如果您的数据总是在每行显示一个条目,就像您显示的那样没有古怪的空白区域:

$ cat tst.awk
BEGIN {
    FS="[><]"; OFS=";"
    n = split("country id_client name",tags,/ /)
    for (i=1; i<=n; i++) {
        printf "%s%s", tags[i], (i<n?OFS:ORS)
    }
 }
{ tag2val[$2] = $3 }
/<\/data>/ {
    for (i=1; i<=n; i++) {
        printf "%s%s", tag2val[tags[i]], (i<n?OFS:ORS)
    }
}

$ awk -f tst.awk file
country;id_client;name
US;50C;toto
US;100K;toto2
China;99E;toto3
China;77B;toto4

If you care about different or additional tags in future, just add them to the list in the split() command. 如果您将来关注不同或其他标记,只需将它们添加到split()命令的列表中即可。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM