简体   繁体   English

如何在shell脚本中解析rss-feeds / xml

[英]How to parse rss-feeds / xml in a shell script

I'd like to parse rss feeds and download podcasts on my ReadyNas which is running 24/7 anyway. 我想在我的ReadyNas上解析rss提要并下载播客 ,无论如何都是全天候运行。

So I'm thinking about having a shell script checking periodically the feeds and spawning wget to download the files. 所以我正在考虑让shell脚本定期检查feed并生成wget来下载文件。

What is the best way to do the parsing? 解析的最佳方法是什么?

Thanks! 谢谢!

Sometimes a simple one liner with shell standard commands can be enough for this: 有时,带有shell标准命令的简单单行程序就足够了:

 wget -q -O- "http://www.rss-specifications.com/rss-podcast.xml" | grep -o '<enclosure url="[^"]*' | grep -o '[^"]*$' | xargs wget -c

Sure this does not work in every case, but it's often good enough. 当然这并不适用于所有情况,但它通常足够好。

Do you have access to awk? 你有权访问awk吗? Maybe you could use XMLGawk 也许你可以使用XMLGawk

I've wrote the following simple script for downloading XML from Amazon S3, so it would be useful for parsing different kind of XML files: 我编写了以下用于从Amazon S3下载XML的简单脚本,因此它可用于解析不同类型的XML文件:

#!/bin/bash
#
# Download all files from the Amazon feed
#
# Usage:
#  ./dl_amazon_feed_files.sh http://example.s3.amazonaws.com/
# Note: Don't forget about slash at the end
#

wget -qO- "$1" | grep -o '<Key>[^<]*' | grep -o "[^>]*$" | xargs -I% -L1 wget -c "$1%"

This is similar approach to @leo answer . 这与@leo答案类似。

I read about XMLStartlet here and there 在这里那里读到了关于XMLStartlet的内容

But is there a port to ReadyNas NV+ available? 但ReadyNas NV +有可用的端口吗?

您可以使用libxml2中的 xsltproc并编写一个简单的xsl样式表来解析rs并输出链接列表。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM