简体   繁体   中英

How to parse rss-feeds / xml in a shell script

I'd like to parse rss feeds and download podcasts on my ReadyNas which is running 24/7 anyway.

So I'm thinking about having a shell script checking periodically the feeds and spawning wget to download the files.

What is the best way to do the parsing?

Thanks!

Sometimes a simple one liner with shell standard commands can be enough for this:

 wget -q -O- "http://www.rss-specifications.com/rss-podcast.xml" | grep -o '<enclosure url="[^"]*' | grep -o '[^"]*$' | xargs wget -c

Sure this does not work in every case, but it's often good enough.

Do you have access to awk? Maybe you could use XMLGawk

I've wrote the following simple script for downloading XML from Amazon S3, so it would be useful for parsing different kind of XML files:

#!/bin/bash
#
# Download all files from the Amazon feed
#
# Usage:
#  ./dl_amazon_feed_files.sh http://example.s3.amazonaws.com/
# Note: Don't forget about slash at the end
#

wget -qO- "$1" | grep -o '<Key>[^<]*' | grep -o "[^>]*$" | xargs -I% -L1 wget -c "$1%"

This is similar approach to @leo answer .

I read about XMLStartlet here and there

But is there a port to ReadyNas NV+ available?

您可以使用libxml2中的 xsltproc并编写一个简单的xsl样式表来解析rs并输出链接列表。

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM