标签内容不在beautifulsoup中返回

Question

I have the following string I'm trying to extract: 我尝试提取以下字符串：

<item>
<dc:creator><![CDATA[Chris M]]></dc:creator>
<pubDate>Tue, 06 Jun 2017 07:38:23 +0000</pubDate>
</item>

I'm trying to get the name Chris M and other authors with this: 我正试图以此来命名Chris M和其他作者：

soup = BeautifulSoup(response, "lxml")
items = soup.findAll("item")
            for i in items:
                 author = i.find('dc:creator')
                 print author

This outputs: 输出：

<dc:creator></dc:creator>

How can I get the name contents from the tag? 如何从标签中获取名称内容？

Answer 1

This worked for me using Python 3 - https://repl.it/languages/python3 这对我使用Python 3 有用 -https: //repl.it/languages/python3

Specifying parser as xml 将解析器指定为xml

import bs4 as bs
content="""
<collection>
    <item><dc:creator><![CDATA[Chris M]]></dc:creator></item>
    <item><dc:creator><![CDATA[Harris A]]></dc:creator></item>
</collection>
"""

soup = bs.BeautifulSoup(content, 'xml')

items = soup.findAll("item")
for i in items:
   author = i.find('creator')
   print(author.string)

Output: 输出：

Chris M
Harris A

Answer 2

BeautifulSoup recognizes CData as a subclass so you can have it check for instances of it. BeautifulSoup将CData识别为子类，因此您可以检查它的实例。

>>> from bs4 import BeautifulSoup, CData

>>> text = """<item>
<dc:creator><![CDATA[Chris M]]></dc:creator>
<pubDate>Tue, 06 Jun 2017 07:38:23 +0000</pubDate>
</item>"""
>>> soup = BeautifulSoup(text)
>>> for item in soup.findAll(text=True):
        if isinstance(item, CData):
            print(item)


Chris M

标签内容不在beautifulsoup中返回

问题描述

2 个解决方案

解决方案1
0 2017-06-07 23:55:21

解决方案2
0 2017-06-08 00:01:08

标签内容不在beautifulsoup中返回

问题描述

2 个解决方案

解决方案1 0 2017-06-07 23:55:21

解决方案2 0 2017-06-08 00:01:08

解决方案1
0 2017-06-07 23:55:21

解决方案2
0 2017-06-08 00:01:08