[英]Awk parse xml to csv
i have a xml that i want to parse into csv, like i started to work with awk, i would like to continue with it but i know it is possible to do it with over language like perl also i found xmlstarlet but i don't have permission to install on server so i'am open on over solutions.我有一个 xml,我想将其解析为 csv,就像我开始使用 awk 一样,我想继续使用它,但我知道可以使用像 perl 这样的语言来完成它,我也找到了 xmlstarlet 但我没有有权在服务器上安装,所以我对解决方案持开放态度。 So my iinput xml is所以我的 iinput xml 是
<?xml version="1.0"?>
<root>
<record>
<id_client>50C</id_client>
<data>
<mail>1@mail.com</mail>
<adress>10 </adress>
<num_tel>001</num_tel>
<key>C</key>
<contact>
<name>toto</name>
<birth>01/30/009</birth>
<city>London</city>
</contact>
</data>
<data>
<mail>2@gmaiil.com</mail>
<adress>20</adress>
<num_tel>02200</num_tel>
<key>D1</key>
<contact>
<name>tata</name>
<birth>02/08/2004</birth>
<city>Bruges</city>
</contact>
</data>
</record>
<record>
<id_client>70D</id_client>
<data>
<mail>3@gmail.com</mail>
<adress>7Bcd</adress>
<num_tel>5555</num_tel>
<key>D2</key>
<contact>
<name>titi</name>
<birth>05/07/2014</birth>
<city>Paris</city>
</contact>
</data>
<data>
<mail>4@gmail.com</mail>
<adress>888</adress>
<num_tel>881.0</num_tel>
<key>D3</key>
<contact>
<name>awk</name>
<birth>05/08/1999</birth>
<city>Lisbone</city>
</contact>
</data>
I would like to output in an over file this csv with hearders我想与听众一起输出这个 csv 文件
id_client;mail;num_tel;key
50C;1@mail.com;001;C
50C,2@gmail.com;02200;D1
70D;3@gmail.com;5555;D2
70D;4@gmail.com;881.0;D3
You're going to run into lots of problems parsing XML line-by-line: XML is not a line-oriented data format.逐行解析 XML 时会遇到很多问题:XML 不是面向行的数据格式。
Use an XML-specific tool.使用特定于 XML 的工具。 Here's how simple it can be:这是多么简单:
xmlstarlet sel -t \
-m / -o "id_client;mail;num_tel;key" -n -b \
-m /root/record/data -v ../id_client -o ";" -v mail -o ";" -v num_tel -o ";" -v key -n \
file.xml
id_client;mail;num_tel;key
50C;1@mail.com;001;C
50C;2@gmaiil.com;02200;D1
70D;3@gmail.com;5555;D2
70D;4@gmail.com;881.0;D3
This answer is given in order to illustrate the text-based procedure to extract the info from the specific .xml formatting shown in the question description (the same .xml can be formatted differently -eg no line feeds- making the process described here unsuitable).给出这个答案是为了说明从问题描述中显示的特定 .xml 格式中提取信息的基于文本的过程(相同的 .xml 可以有不同的格式 - 例如没有换行 - 使得这里描述的过程不合适) .
If possible, use a XML-specific tool as xmllint .如果可能,请使用特定于 XML 的工具作为xmllint 。
Text-based one liner:基于文本的单衬:
cat input.xml | grep -e \<mail\> -e \<adress\> -e \<num_tel\> -e \<key\> | sed 's/<[^>]*>//g' | sed 's/^\s*//g; s/\s*$//g' | paste -d ";" - - - -
Explanation:解释:
cat input.xml
)读取输入文件( cat input.xml
)grep
)获取适当的标签行(使用grep
)sed
)删除 XML 标签,只留下标签内容(使用sed
)sed
again; two expressions in a single sed command: one for the leading spaces and one for the traling spaces)修剪空格(再次使用sed
;单个 sed 命令中的两个表达式:一个用于前导空格,另一个用于尾随空格)paste
)每 4 行粘贴为列(使用paste
)With Python, which has an XML parser in its standard library and a decent chance of being preinstalled on the server to which you have to deploy:使用 Python,它的标准库中有一个XML 解析器,并且很有可能预装在您必须部署到的服务器上:
#!/usr/bin/python
import xml.etree.ElementTree as ET
import sys
tree = ET.parse(sys.argv[1])
root = tree.getroot()
print "id_client;mail;num_tel;key"
# Rudimentary error handling: If a field is not there,
# print (nil) in its stead.
def xml_read(node, key):
p = node.find(key)
if p is None:
return "(nil)"
return p.text
for r in root.iter("record"):
for d in r.iter("data"):
print xml_read(r, "id_client") + ";" + xml_read(d, "mail") + ";" + xml_read(d, "num_tel") + ";" + xml_read(d, "key")
Alternatively, if you have access to an XSLT processor (although I dare not hope for this), you could use the following stylesheet:或者,如果您可以访问 XSLT 处理器(尽管我不敢希望如此),您可以使用以下样式表:
<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:template match="/root">id_client;mail;num_tel;key
<xsl:for-each select="record">
<xsl:for-each select="data"><xsl:value-of select="../id_client"/>;<xsl:value-of select="mail"/>;<xsl:value-of select="num_tel"/>;<xsl:value-of select="key"/><xsl:text>
</xsl:text></xsl:for-each>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
Use用
xsltproc filename.xsl filename.xml
or或者
xalan -xsl filename.xsl -in filename.xml
where filename.xsl
is the file that contains the above XSLT.其中filename.xsl
是包含上述 XSLT 的文件。 If you have a different XSLT processor, it will work just as well;如果您有不同的 XSLT 处理器,它也能正常工作; consult its manpage to see how it wants to be invoked.查阅其联机帮助页以了解它希望如何被调用。
You could try this:你可以试试这个:
awk 'BEGIN{ RS="record"; FS="[<>]" } { print $10 "," $14 "," $18 }' file
Which is not the most portable way to do it.这不是最便携的方法。 Better would be:更好的是:
awk -F'[<>]' '$2 == "mail" || $2 == "adress" { printf "%s\, ", $3 }; $2 == "num_tel" { print $3 }' a
That way you can add other lines without a problem, as long as you don't change the keys.这样您就可以毫无问题地添加其他行,只要您不更改密钥。
#!/usr/bin/perl
use XML::DT;
my %handler=(
-default => sub{ $c}, # $c - element contents
-type => { data => "MAP" }, # data suns became (tag => $c)
id_client => sub{ father(id=>$c);},
data => sub{ print father("id"),";$c->{mail};$c->{num_tel};$c->{key}\n"},
);
dt(shift, %handler);
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.