[英]Creating bash script to parse xml file to csv
I'm trying to create a bash script to parse an xml file and save it to a csv file. 我正在尝试创建一个bash脚本来解析xml文件并将其保存到csv文件。
For example: 例如:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<List>
<Job id="1" name="John/>
<Job id="2" name="Zack"/>
<Job id="3" name="Bob"/>
</List>
I would like the script to save information into a csv file as such: 我希望脚本将信息保存到这样的csv文件中:
John | 1
Zack | 2
Bob | 3
The name and id will be in a different cell. 名称和ID将在其他单元格中。
Is there any way I can do this? 有什么办法可以做到吗?
You've posted a query similar to your pervious one . 你已经张贴类似的查询透水一个 。 I'd again suggest using a XML parser.
我再次建议使用XML解析器。 You could say:
你可以说:
xmlstarlet sel -t -m //List/Job -v @name -o "|" -v @id -n file.xml
It would return 它会回来
John|1
Zack|2
Bob|3
for your sample data. 为您的样本数据。
Pipe the output to sed
: sed "s/|/\\t| /"
if you want it to appear as in your example. 如果希望像示例中那样显示输出,请将其通过管道传递到
sed
: sed "s/|/\\t| /"
。
Try something like this 试试这个
#!/bin/bash
while read -r line; do
[[ $line =~ "name=\""(.*)"\"" ]] && name="${BASH_REMATCH[1]}" && [[ $line =~ "Job id=\""([^\"]+) ]] && echo "$name | ${BASH_REMATCH[1]}"
done < file
The line with John
is malformed. 与
John
的台词格式错误。 With it fixed, example output 固定后,示例输出
John | 1
Zack | 2
Bob | 3
Using sed 使用sed
sed -nr 's/.*id=\"([0-9]*)\"[^\"]*\"(\w*).*/\2 | \1/p' file
Additional, base on BroSlow's cript, I merge the options. 另外,基于BroSlow的版本,我合并了选项。
#!/bin/bash
while read -r line; do
[[ $line =~ id=\"([0-9]+).*name=\"([^\"|/]*) ]] && echo "${BASH_REMATCH[2]} | ${BASH_REMATCH[1]}"
done < file
Extending xmlstarlet approach: 扩展xmlstarlet方法:
Given this xml file as input: 给定此xml文件作为输入:
<DATA>
<RECORD>
<NAME>John</NAME>
<SURNAME>Smith</SURNAME>
<CONTACTS>
"Smith" LTD,
London, Mtg Str, 12,
UK
</CONTACTS>
</RECORD>
</DATA>
And this script: 这个脚本:
xmlstarlet sel -e utf-8 -t \
-o "NAME, SURNAME, CONTACTS" -n \
-m //DATA/RECORD \
-o "\"" \
-v $"str:replace(normalize-space(NAME), '\"', '\"\"')" -o "\",\"" \
-v $"str:replace(normalize-space(SURNAME), '\"', '\"\"')" -o "\",\"" \
-v $"str:replace(normalize-space(CONTACTS), '\"', '\"\"')" -o "\",\"" \
-o "\"" \
-n file.xml
You'll have the following output: 您将获得以下输出:
NAME, SURNAME, CONTACTS
"John", "Smith", """Smith"" LTD, London, Mtg Str, 12, UK"
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.