简体   繁体   English

如何使用feedparser解析“<media:group>”?

[英]How to parse the “<media:group>” using feedparser?

The rss file is shown as below, i want to get the content in section media:group . rss文件如下所示,我想获取部分media:group中的内容。 I check the document of feedparser, but it seems not mention this. 我查看了feedparser的文档,但似乎没有提到这一点。 How to do it? 怎么做? Any help is appreciated. 任何帮助表示赞赏。

<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:ymusic="http://music.yahoo.com/rss/1.0/ymusic/" xmlns:media="http://search.yahoo.com/mrss/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:cf="http://www.microsoft.com/schemas/rss/core/2005" xmlns:dc="http://purl.org/dc/elements/1.1/" version="2.0"><channel>
        <title>XYZ InfoX:  Special hello  </title>
        <link>http://www1.XYZInfoX.com/learninghello/home</link>
        <description>hello</description>
        <language>en</language>         <copyright />
        <pubDate>Wed, 17 Mar 2010 08:50:06 GMT</pubDate>
        <dc:creator />
        <dc:date>2010-03-17T08:50:06Z</dc:date>
        <dc:language>en</dc:language> <dc:rights />
        <image>
            <title>Voice of America</title>
            <link>http://www1.XYZInfoX.com/learninghello</link>
            <url>http://media.XYZInfoX.com/designimages/XYZRSSIcon.gif</url>
        </image>

        <item>
                <title>Who Were the Deadliest Gunmen of the Wild West?</title>
                <link>http://www1.XYZInfoX.com/learninghello/home/Deadliest-Gunmen-of-the-Wild-West-87826807.html</link>
                <description> The story of two of them: "Killin'" Jim Miller was an outlaw, "Texas" John Slaughter was a lawman | EXPLORATIONS  </description>
                <pubDate>Wed, 17 Mar 2010 00:38:48 GMT</pubDate>
                <guid isPermaLink="false">87826807</guid>
                <dc:creator></dc:creator>
                <dc:date>2010-03-17T00:38:48Z</dc:date>                                                                                                                                     
                <media:group>
                    <media:content url="http://media.XYZInfoX.com/images/archives_peace_comm_480_16mar_se.jpg" medium="image" isDefault="true" height="300" width="480" />
                    <media:content url="http://media.XYZInfoX.com/images/archives_peace_comm_230_16mar_se_edited-1.jpg" medium="image" isDefault="false" height="230" width="230" />
                    <media:content url="http://media.XYZInfoX.com/images/tex_trans_lawmans_230_16mar10_se.jpg" medium="image" isDefault="false" height="230" width="230" />
                    <media:content url="http://www.XYZInfoX.com/MediaAssets2/learninghello/dalet/se-exp-outlaws-part2-17mar2010.Mp3" type="audio/mpeg" medium="audio" isDefault="false" />
                </media:group>
     </item>

feedparser 4.1 as available from PyPi has this bug. PyPi提供的feedparser 4.1有这个bug。

the solution for me was to get the latest feedparser.py (4.2 pre) from the repository. 我的解决方案是从存储库中获取最新的feedparser.py(4.2 pre)。

svn checkout http://feedparser.googlecode.com/svn/trunk/ feedparser-readonly
cd feedparser-readonly
python setup.py install

now you can access all mrss items 现在您可以访问所有mrss项目

>>> import feedparser  # the new version!
>>> d = feedparser.parse(MY_XML_URL)
>>> for content in d.entries[0].media_content: print content['url']

should do the job for you 应该为你做的工作

You can parse the feed using 您可以使用解析Feed

feed = feedparser.parse(your_feeds_url)

and then access your xml elements using either python's attribute access or dictionary-like access on feed and its subelements. 然后使用python的属性访问或对feed及其子元素的字典式访问来访问xml元素。 The former method won't work for an element name like media:content , so use the latter method. 前一种方法不适用于media:content等元素名称,因此请使用后一种方法。

The rest should become clear after studying the examples at http://www.feedparser.org http://www.feedparser.org上研究这些例子之后,其余部分应该变得清晰

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM