简体   繁体   English

标签内容不在beautifulsoup中返回

[英]tag contents not returning in beautifulsoup

I have the following string I'm trying to extract: 我尝试提取以下字符串:

<item>
<dc:creator><![CDATA[Chris M]]></dc:creator>
<pubDate>Tue, 06 Jun 2017 07:38:23 +0000</pubDate>
</item>

I'm trying to get the name Chris M and other authors with this: 我正试图以此来命名Chris M和其他作者:

soup = BeautifulSoup(response, "lxml")
items = soup.findAll("item")
            for i in items:
                 author = i.find('dc:creator')
                 print author

This outputs: 输出:

<dc:creator></dc:creator>

How can I get the name contents from the tag? 如何从标签中获取名称内容?

This worked for me using Python 3 - https://repl.it/languages/python3 这对我使用Python 3 有用 -https: //repl.it/languages/python3

Specifying parser as xml 将解析器指定为xml

import bs4 as bs
content="""
<collection>
    <item><dc:creator><![CDATA[Chris M]]></dc:creator></item>
    <item><dc:creator><![CDATA[Harris A]]></dc:creator></item>
</collection>
"""

soup = bs.BeautifulSoup(content, 'xml')

items = soup.findAll("item")
for i in items:
   author = i.find('creator')
   print(author.string)

Output: 输出:

Chris M
Harris A

BeautifulSoup recognizes CData as a subclass so you can have it check for instances of it. BeautifulSoup将CData识别为子类,因此您可以检查它的实例。

>>> from bs4 import BeautifulSoup, CData

>>> text = """<item>
<dc:creator><![CDATA[Chris M]]></dc:creator>
<pubDate>Tue, 06 Jun 2017 07:38:23 +0000</pubDate>
</item>"""
>>> soup = BeautifulSoup(text)
>>> for item in soup.findAll(text=True):
        if isinstance(item, CData):
            print(item)


Chris M

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM