简体   繁体   English

使用bash中的xmllint将XML文件中的表的两个字段转换为CSV吗?

[英]Converting two fields of a table in an XML file into CSV using xmllint in bash?

I've got an XML file (converted from HTML) containing fields like this: 我有一个包含如下字段的XML文件(从HTML转换):

<tr>
  <td data-title="Date">2018-01-01</td>
  <td data-title="Version"><a href="https://some-link">25.1</a></td>
</tr>
<tr>
  <td data-title="Date">2018-03-01</td>
  <td data-title="Version"><a href="https://some-link">24.1</a></td>
</tr>

I've been using 'xmllint' to extract single values: 我一直在使用'xmllint'提取单个值:

textarea=$(echo "$xml" | xmllint --xpath 'string(//*[@id="content"])' 2>/dev/null )

and multiple values: 和多个值:

list=$(echo "$xml" | xmllint --xpath 'string(/html/body/div/ul)' 2>/dev/null )

but now I want to extract two fields from each record, in CSV format or something similar. 但是现在我想从每个记录中提取两个字段,格式为CSV或类似格式。

The closest I've got is this: 我最接近的是:

xpath tr/*[@data-title="Date" or @data-title="Version"]/text()
Object is a Node Set :
Set contains 20 nodes:
1  TEXT
    content=Apr 9, 2018 6:13 PM UTC
2  TEXT
    content=Mar 21, 2018 10:41 PM UTC
3  TEXT
    content=Mar 19, 2018 9:22 PM UTC

Can you show me a way to achieve this with a better xpath? 您能告诉我一种使用更好的xpath实现此目标的方法吗?

This is a way to go with xmllint 这是使用xmllint的一种方式

xmllint --html --xpath '//tr/td[@data-title="Date"] | //tr/td[@data-title="Version"]' test.html | sed -re 's%(</[^>]+>)%\1\n%g'

Output: 输出:

<td data-title="Date">2018-01-01</td>
<td data-title="Version"><a href="https://some-link">25.1</a></td>
<td data-title="Date">2018-03-01</td>
<td data-title="Version"><a href="https://some-link">24.1</a></td>
  • Add --html option to signal html input 添加--html选项以信号html输入
  • Add // to xpath to search for relative paths. 在xpath中添加//以搜索相对路径。 Your xpath does not have any slash at start so that xpath is relative to the current node. 您的xpath在开始时没有任何斜杠,因此xpath相对于当前节点。 On xmllint shell that is related to how you used the cd command. 在xmllint外壳上,它与您使用cd命令的方式有关。
  • Finally, use the | 最后,使用| operator to search for two or more xpaths. 操作符以搜索两个或多个xpath。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM