[英]I want to parse text from a compressed XML feed using awk
I am trying to parse <title>
and <description>
from the compressed XML feed at http://rss.slashdot.org/Slashdot/slashdot . 我试图从http://rss.slashdot.org/Slashdot/slashdot的压缩XML提要中解析
<title>
和<description>
。 I am trying to do the following 我正在尝试执行以下操作
curl --silent "http://rss.slashdot.org/Slashdot/slashdot" | awk '/\btitle\b(.*?)\bdescription\b/'
and grep -E
etc., but I could not get the substrings I wanted. 和
grep -E
等,但是我无法获得想要的子字符串。 It always returns the entire XML as it's compressed and the data is in one line. 它总是在压缩后返回整个XML,并且数据在一行中。
I was able to test my Regex string by running it in a text editor. 我可以通过在文本编辑器中运行它来测试我的Regex字符串。
Appreciate your help!! 感谢您的帮助!! Thank you!
谢谢!
Using a XML parser would help, here a test with perl
and XML::Twig
. 使用XML解析器会有所帮助,这里使用
perl
和XML::Twig
进行测试。 Adapt it to your needs. 使其适应您的需求。
Content of script.pl
: script.pl
内容:
#!/usr/bin/env perl
use warnings;
use strict;
use XML::Twig;
my $twig = XML::Twig->new(
twig_handlers => {
'title' => \&extract_text,
'description' => \&extract_text,
},
)->parsefile( shift );
sub extract_text {
my ($t, $e) = @_;
printf qq|%s\n=================\n|, $e->tag;
printf qq|%s\n\n|, $e->text;
}
Run it like: 像这样运行:
curl --silent "http://rss.slashdot.org/Slashdot/slashdot" | perl script.pl -
That yiedls something like the following for each pair title and description: 对于每对标题和描述,这类似于以下内容:
title
=================
Proof-of-Concept Port of XBMC to SDL 2.0 and Wayland
description
=================
hypnosec wrote in with news that XBMC has ...
Here's an XSLT solution: 这是XSLT解决方案:
curl -s -o- http://rss.slashdot.org/Slashdot/slashdot | xsltproc slashdot.xsl -
where slashdot.xsl
is slashdot.xsl
在哪里
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text" />
<xsl:variable name="newline">
<xsl:text>
</xsl:text>
</xsl:variable>
<xsl:template match="/">
<xsl:apply-templates select='//item' />
</xsl:template>
<xsl:template match='//item'>
<xsl:value-of select='title' /><xsl:value-of select='$newline' />
<xsl:text>====</xsl:text><xsl:value-of select='$newline' />
<xsl:value-of select='description' /><xsl:value-of select='$newline' />
<xsl:value-of select='$newline' />
</xsl:template>
</xsl:stylesheet>
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.