在python中解析不是纯xml的文件的最佳方法是什么

Question

i am trying to parse a file in python which is not pure xml, since its not purly xml, xml parser fails to parse the file. 我试图解析不是纯xml的python文件，因为它不是纯xml，xml解析器无法解析该文件。

please suggest me a solution to this , i dont want to read the file with I/O functions. 请建议我一个解决方案，我不想读取具有I / O功能的文件。

<groups>
   <url>
      description = helloz
      <whatis>
         <what_is_that>
            active = yes
            <inside_what>
               <default>
                  <0>
                     tagid = 0

                  </0>
               </default>
            </inside_what>
            <second_list>
               <0>
                  name = do
               </0>
            </second_list>
         </what_is_that>

Answer 1

You can try something like this with BeautifulSoup. 您可以使用BeautifulSoup尝试类似的操作。 When you make a BeautifulSoup object it will insert the missing closing tags by itself. 当您创建BeautifulSoup对象时，它将单独插入缺少的结束标记。 And then you are good to go and extract anything you want. 然后，您可以轻松提取所需的任何内容。

from bs4 import BeautifulSoup

with open('file_name', 'r') as f:
    a = f.read()
    soup=BeautifulSoup(a, 'lxml')
    print soup.find('inside_what')

Output: 输出：

<inside_what>
<default>

                     tagid = 0

                  0&gt;
               </default>
</inside_what>

在python中解析不是纯xml的文件的最佳方法是什么

问题描述

1 个解决方案

解决方案1
0 2016-12-29 13:37:12

在python中解析不是纯xml的文件的最佳方法是什么

问题描述

1 个解决方案

解决方案1 0 2016-12-29 13:37:12

解决方案1
0 2016-12-29 13:37:12