简体   繁体   中英

Xidel extract data inside the tag — raw output

Pleased to be member of StackOverflow, a long time lurker in here.

I need to parse text between two tags, so far I've found a wonderful tool called Xidel

I need to parse text in between

 <div class="description"> Text. <tag>Also tags.</tag> More text. </div>

However, said text can include HTML tags in it, and I want them to be printed out in raw format. So using a command like:

xidel --xquery '//div[@class="description"]' file.html

Gets me:

Text. Also tags. More text.

And I need it to be exactly as it is, so:

Text. <tag>Also tags.</tag> More text.

How can I achieve this?

Regards, R

Can be done in a couple of ways with Xidel, which is why I love it so much.

HTML-templating:

xidel -s file.html -e "<div class='description'>{inner-html()}</div>"

XPath:

xidel -s file.html -e "//div[@class='description']/inner-html()"

CSS:

xidel -s file.html -e "inner-html(css('div.description'))"

BTW, on Linux: swap the double quotes for single and vice versa.

您可以通过添加--output-format=xml选项来显示标签。

xidel --xquery '//div[@class="description"]' --output-format=xml file.html 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM