如何从RSS提要中删除HTML标记，并使用Shell脚本将结果另存为CSV？

Question

here's my problem: I try to parse a xml feed and extract two fields (title and link) - this part is working fine. 这是我的问题：我尝试解析xml提要并提取两个字段（标题和链接）-这部分工作正常。 How can I remove all html tags and save the result in a csv format eg 如何删除所有html标签并将结果保存为csv格式，例如

title,link 标题，链接
title,link 标题，链接
title,link 标题，链接

#!/bin/sh
url="http://www.buzzfeed.com/usnews.xml"
curl --silent "$url" | grep -E '(title>|link>)' >> output

Answer 1

Use an XML parser to parse XML. 使用XML解析器解析XML。 I assume you want the title and link for the feed items, not for the feed itself. 我假设您想要供稿项目的标题和链接，而不是供稿本身。

curl --silent "$url" | 
xmlstarlet sel -t -m '/rss/channel/item' -v 'title' -n -v 'link' -n | 
awk '{
    title=$0
    gsub(/"/, "&&", title)
    getline
    printf "\"%s\",\"%s\"\n", title, $0
}'

The xmlstarlet command parses the feed, and for each /rss/channel/item outputs the title value and the link value on separate lines. xmlstarlet命令解析提要，并且对于每个/rss/channel/item在单独的行上输出标题值和链接值。 Then awk picks up the stream and massages it into CSV. 然后，awk接收流并将其按摩成CSV。

Just for fun, a sed version of that awk: 只是为了好玩，该awk的sed版本：

sed -n 's/"/&&/g;s/^\|$/"/g;h;n;s/"/&&/g;s/^\|$/"/g;x;G;s/\n/,/;p'

or 要么

sed -n '         #  do not automatically print
                 #  current line is the title
    s/"/&&/g     #  double up any double quotes (CSV quote escaping)
    s/^\|$/"/g   #  add leading and trailing double quotes
    h            #  store current pattern space (title) into hold space
    n            #  read the next line (the link) from input
    s/"/&&/g     #  double up any double quotes (CSV quote escaping)
    s/^\|$/"/g   #  add leading and trailing double quotes
    x            #  exchange pattern space (link) and hold space (title)
    G            #  append a newline to title and then append link
    s/\n/,/      #  replace the newline with a comma
    p            #  and print it
'

如何从RSS提要中删除HTML标记，并使用Shell脚本将结果另存为CSV？

问题描述

1 个解决方案

解决方案1
2 2015-01-29 13:56:27

如何从RSS提要中删除HTML标记，并使用Shell脚本将结果另存为CSV？

问题描述

1 个解决方案

解决方案1 2 2015-01-29 13:56:27

解决方案1
2 2015-01-29 13:56:27