简体   繁体   English

awk 将 xml 解析为 csv

[英]Awk parse xml to csv

i have a xml that i want to parse into csv, like i started to work with awk, i would like to continue with it but i know it is possible to do it with over language like perl also i found xmlstarlet but i don't have permission to install on server so i'am open on over solutions.我有一个 xml,我想将其解析为 csv,就像我开始使用 awk 一样,我想继续使用它,但我知道可以使用像 perl 这样的语言来完成它,我也找到了 xmlstarlet 但我没有有权在服务器上安装,所以我对解决方案持开放态度。 So my iinput xml is所以我的 iinput xml 是

<?xml version="1.0"?>
<root>
  <record>
   <id_client>50C</id_client>  
  <data>
          <mail>1@mail.com</mail>
          <adress>10  </adress>
          <num_tel>001</num_tel>
          <key>C</key>
      <contact>
        <name>toto</name>
        <birth>01/30/009</birth>
        <city>London</city>
      </contact>
  </data> 
  <data>
          <mail>2@gmaiil.com</mail>
          <adress>20</adress>
          <num_tel>02200</num_tel>
          <key>D1</key>
      <contact>
        <name>tata</name>
        <birth>02/08/2004</birth>
        <city>Bruges</city>
      </contact>
  </data> 
</record>
   <record>
   <id_client>70D</id_client>  
  <data>
          <mail>3@gmail.com</mail>
          <adress>7Bcd</adress>
          <num_tel>5555</num_tel>
          <key>D2</key>
      <contact>
        <name>titi</name>
        <birth>05/07/2014</birth>
        <city>Paris</city>
      </contact>
  </data>
  <data>
          <mail>4@gmail.com</mail>
          <adress>888</adress>
          <num_tel>881.0</num_tel>
          <key>D3</key>
      <contact>
        <name>awk</name>
        <birth>05/08/1999</birth>
        <city>Lisbone</city>
      </contact>
  </data>

I would like to output in an over file this csv with hearders我想与听众一起输出这个 csv 文件

id_client;mail;num_tel;key 
50C;1@mail.com;001;C
50C,2@gmail.com;02200;D1
70D;3@gmail.com;5555;D2 
70D;4@gmail.com;881.0;D3

You're going to run into lots of problems parsing XML line-by-line: XML is not a line-oriented data format.逐行解析 XML 时会遇到很多问题:XML 不是面向行的数据格式。

Use an XML-specific tool.使用特定于 XML 的工具。 Here's how simple it can be:这是多么简单:

xmlstarlet sel -t \
  -m / -o "id_client;mail;num_tel;key" -n -b \
  -m /root/record/data -v ../id_client -o ";" -v mail -o ";" -v num_tel -o ";" -v key -n \
file.xml
id_client;mail;num_tel;key
50C;1@mail.com;001;C
50C;2@gmaiil.com;02200;D1
70D;3@gmail.com;5555;D2
70D;4@gmail.com;881.0;D3

This answer is given in order to illustrate the text-based procedure to extract the info from the specific .xml formatting shown in the question description (the same .xml can be formatted differently -eg no line feeds- making the process described here unsuitable).给出这个答案是为了说明从问题描述中显示的特定 .xml 格式中提取信息的基于文本的过程(相同的 .xml 可以有不同的格式 - 例如没有换行 - 使得这里描述的过程不合适) .

If possible, use a XML-specific tool as xmllint .如果可能,请使用特定于 XML 的工具作为xmllint

Text-based one liner:基于文本的单衬:

cat input.xml | grep -e \<mail\> -e \<adress\> -e \<num_tel\> -e \<key\> | sed 's/<[^>]*>//g' | sed 's/^\s*//g; s/\s*$//g' | paste -d ";" - - - -

Explanation:解释:

  1. Read input file ( cat input.xml )读取输入文件( cat input.xml
  2. Get the appropriate tags lines (with grep )获取适当的标签行(使用grep
  3. Remove XML tags with, leaving only the tag contents (with sed )删除 XML 标签,只留下标签内容(使用sed
  4. Trim spaces (with sed again; two expressions in a single sed command: one for the leading spaces and one for the traling spaces)修剪空格(再次使用sed ;单个 sed 命令中的两个表达式:一个用于前导空格,另一个用于尾随空格)
  5. Paste every 4 lines as columns (with paste )每 4 行粘贴为列(使用paste

With Python, which has an XML parser in its standard library and a decent chance of being preinstalled on the server to which you have to deploy:使用 Python,它的标准库中有一个XML 解析器,并且很有可能预装在您必须部署到的服务器上:

#!/usr/bin/python

import xml.etree.ElementTree as ET
import sys

tree = ET.parse(sys.argv[1])
root = tree.getroot()

print "id_client;mail;num_tel;key"

# Rudimentary error handling: If a field is not there,
# print (nil) in its stead.    
def xml_read(node, key):
    p = node.find(key)
    if p is None:
        return "(nil)"
    return p.text

for r in root.iter("record"):
    for d in r.iter("data"):
        print xml_read(r, "id_client") + ";" + xml_read(d, "mail") + ";" + xml_read(d, "num_tel") + ";" + xml_read(d, "key")

Alternatively, if you have access to an XSLT processor (although I dare not hope for this), you could use the following stylesheet:或者,如果您可以访问 XSLT 处理器(尽管我不敢希望如此),您可以使用以下样式表:

<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:template match="/root">id_client;mail;num_tel;key
<xsl:for-each select="record">
  <xsl:for-each select="data"><xsl:value-of select="../id_client"/>;<xsl:value-of select="mail"/>;<xsl:value-of select="num_tel"/>;<xsl:value-of select="key"/><xsl:text>&#xa;</xsl:text></xsl:for-each>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet> 

Use

xsltproc filename.xsl filename.xml

or或者

xalan -xsl filename.xsl -in filename.xml

where filename.xsl is the file that contains the above XSLT.其中filename.xsl是包含上述 XSLT 的文件。 If you have a different XSLT processor, it will work just as well;如果您有不同的 XSLT 处理器,它也能正常工作; consult its manpage to see how it wants to be invoked.查阅其联机帮助页以了解它希望如何被调用。

You could try this:你可以试试这个:

awk 'BEGIN{ RS="record"; FS="[<>]" } { print $10 "," $14 "," $18 }' file

Which is not the most portable way to do it.这不是最便携的方法。 Better would be:更好的是:

awk -F'[<>]' '$2 == "mail" || $2 == "adress" { printf "%s\, ", $3 }; $2 == "num_tel" { print $3 }' a

That way you can add other lines without a problem, as long as you don't change the keys.这样您就可以毫无问题地添加其他行,只要您不更改密钥。

#!/usr/bin/perl
use XML::DT;

my %handler=(
  -default  => sub{ $c},                # $c - element contents
  -type     => { data => "MAP" },       # data suns became (tag => $c)

  id_client => sub{ father(id=>$c);},
  data      => sub{ print father("id"),";$c->{mail};$c->{num_tel};$c->{key}\n"},
);
dt(shift, %handler);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM