简体   繁体   English

如何在Xidel中更改导出变量的顺序?

[英]How to change order of exported variable in Xidel?

I am using Xidel to scrape information from webpage and I am stuck on exporting the information in a different order than it is on the page. 我正在使用Xidel从网页上抓取信息,并且我坚持以与页面上不同的顺序导出信息。

Example: 例:

<tr>
<td></td>
<td></td>
<td></td>
<td><a><font><b>{ location:=. }</b></font>{ title:=. }</a></td>
<td>{ dates:=. }</td>
<td></td>
</tr>

This code will export as title, and then subtitle. 此代码将导出为标题,然后导出为字幕。 Is there any way in Xidel to change the order? Xidel中有什么方法可以更改订单?

This may be as easy as: 这可能很简单:

xidel -q page.html -e subtitle:=//h2,title:=//h1

Something like the following (with several "-e" params) would also work, but like the previous code it will first group all subtitles and then all titles on the page, which is probably not what you want... 类似于以下内容(带有多个“ -e”参数)的东西也可以使用,但是像前面的代码一样,它将首先将所有字幕分组,然后将页面上的所有标题分组,这可能不是您想要的...

xidel -q page.html -e "<div><h2>{subtitle:=.}</h2></div>+" -e "<div><h1>{title:=.}</h1></div>+" 

AFAIK, in your case there's no ordering feature in Xidel. AFAIK,就您而言,Xidel中没有订购功能。 But what you CAN do is write a script wherein you save the values as env. 但是您可以做的是编写一个脚本,在其中将值保存为env。 variables with the xidel --output-format cmd (if Windows) and then (in the right order) echo/process those variables/values. 变量使用xidel --output-format cmd(如果是Windows),然后(以正确的顺序)回显/处理这些变量/值。

Dirkk has given a great tip (to not group), with that your line could look something like this: Dirkk提供了一个很好的技巧(不建议分组),您的台词可能看起来像这样:

xidel -q page.html --xquery "for $i in //div return (concat('sub:=',$i/h2), concat('title:=',$i/h1))"

I have never used this tool, but given a quick look at the documentation and seeing that it supports XQuery, the following should work I guess: 我从未使用过此工具,但是快速浏览了文档并发现它支持XQuery,我猜应该可以使用以下工具:

xidel -q page.html --xquery "for $div in //div return ($div/h2, $div/h1)" --output-format xml 

This assumes you have several such div elements in your page and want to sort all your titles with a subtitle first individually, ie not all subtitles first. 假设您的页面中有几个这样的div元素,并且想要对所有标题分别进行字幕排序,即不是所有字幕都排在前面。 Also, as you not have given a more specific example XML, it simply selects all divs and iterates over them - In real world HTML you probably want more characteristic features (like id attributes). 另外,由于您没有提供更具体的示例XML,因此只需选择所有div并对其进行迭代-在现实世界的HTML中,您可能需要更多特性(例如id属性)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM