如何使用LXML或BeautifulSoup从CDATA标记中删除但保留Python中的实际数据

Question

I have some XML I am parsing in which I am using BeautifulSoup as the parser. 我正在解析一些XML，其中使用BeautifulSoup作为解析器。 I pull the CDATA out with the following code, but I only want the data and not the CDATA TAGS. 我使用以下代码提取CDATA，但我只需要数据而不是CDATA标记。

    myXML = open("c:\myfile.xml", "r")
    soup = BeautifulSoup(myXML)
    data = soup.find(text=re.compile("CDATA"))

    print data

    <![CDATA[TEST DATA]]>

What I would like to see if the following output: 我想看看是否有以下输出：

TEST DATA 测试数据

I don't care if the solution is in LXML or BeautifulSoup. 我不在乎解决方案是在LXML还是BeautifulSoup中。 Just want the best or easiest way to get the job done. 只想要最好或最简单的方法来完成工作。 Thanks! 谢谢！

Here is a solution: 这是一个解决方案：

    parser = etree.XMLParser(strip_cdata=False)
    root = etree.parse(self.param1, parser)
    data = root.findall('./config/script')
    for item in data:  # iterate through list to find text contained in elements containing CDATA
        print item.text

Answer 1

Based on the lxml docs : 基于lxml docs ：

>>> from lxml import etree
>>> parser = etree.XMLParser(strip_cdata=False)
>>> root = etree.XML('<root><data><![CDATA[test]]></data></root>', parser)
>>> data = root.findall('data')
>>> for item in data:  # iterate through list to find text contained in elements containing CDATA
    print item.text

test  # just the text of <![CDATA[test]]>

This might be the best way to get the job done, depending on how amenable your xml structure is to this approach. 这可能是完成工作的最佳方法，具体取决于您的xml结构对该方法的适应程度。

Answer 2

Based on BeautifulSoup: 基于BeautifulSoup：

>>> str='<xml>  <MsgType><![CDATA[text]]></MsgType>  </xml>'
>>> soup=BeautifulSoup(str, "xml") 
>>> soup.MsgType.get_text()
u'text'
>>> soup.MsgType.string
u'text'
>>> soup.MsgType.text
u'text'

As the result, it just print the text from msgtype; 结果，它只打印msgtype中的文本。

如何使用LXML或BeautifulSoup从CDATA标记中删除但保留Python中的实际数据

问题描述

2 个解决方案

解决方案1
2 已采纳 2014-01-30 02:18:08

解决方案2
0 2014-02-14 06:22:42

如何使用LXML或BeautifulSoup从CDATA标记中删除但保留Python中的实际数据

问题描述

2 个解决方案

解决方案1 2 已采纳 2014-01-30 02:18:08

解决方案2 0 2014-02-14 06:22:42

解决方案1
2 已采纳 2014-01-30 02:18:08

解决方案2
0 2014-02-14 06:22:42