简体   繁体   English

使用xml.dom.minidom python编写xml文件时出现问题

[英]Issues when writing an xml file using xml.dom.minidom python

I have an xml file and a python script is used for adding a new node to that xml file.I used xml.dom.minidom module for processing the xml file.My xml file after processing with the python module is given below 我有一个xml文件,一个python脚本用于向该xml文件中添加新节点。我使用xml.dom.minidom模块来处理xml文件。使用python模块处理后的xml文件如下所示

<?xml version="1.0" ?><Project DefaultTargets="Build" ToolsVersion="4.0" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
<PostBuildEvent>
  <Command>xcopy &quot;SourceLoc&quot; &quot;DestLoc&quot;</Command>
</PostBuildEvent>
<ImportGroup Label="ExtensionTargets">
</ImportGroup>
<Import Project="project.targets"/></Project>

What i actually needed is as given below .The changes are a newline character after the first line and before the last line and also '&quot' is converted to " 我实际需要的是如下所示。更改是在第一行之后和最后一行之前的换行符,并且还将“”转换为“

<?xml version="1.0" ?>
<Project DefaultTargets="Build" ToolsVersion="4.0" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
<PostBuildEvent>
  <Command>xcopy "SourceLoc" "DestLoc"</Command>
</PostBuildEvent>
<ImportGroup Label="ExtensionTargets">
</ImportGroup>
<Import Project="project.targets"/>
</Project>

The python code i used is given below 我使用的python代码如下

xmltree=xml.dom.minidom.parse(xmlFile)
for Import in Project.getElementsByTagName("Import"):
   newImport = xml.dom.minidom.Element("Import")
   newImport.setAttribute("Project", "project.targets")
vcxprojxmltree.writexml(open(VcxProjFile, 'w'))

What should i update in my code to get the xml in correct format 我应该在我的代码中更新什么以获取正确格式的xml?

Thanks, 谢谢,

From docs of minidom: 来自minidom文档:

Node.toprettyxml([indent=""[, newl=""[, encoding=""]]])

Return a pretty-printed version of the document. indent specifies the indentation string and defaults to a tabulator; newl specifies the string emitted at the end of each line and defaults to \n.

That's all customisation you get from minidom. 这就是您从minidom获得的所有定制。

Tried inserting a Text node as a root sibling for newline. 尝试插入Text节点作为换行符的根同级。 Hope dies last. 永不放弃。 I recommend using regular expressions from re module and inserting it manually. 我建议使用re模块中的正则表达式并手动插入。

As for removing SGML entities, there's apparently an undocumented function for that in python standard library: 至于删除SGML实体,在python标准库中显然有一个未记录的函数:

import HTMLParser
h = HTMLParser.HTMLParser()
unicode_string = h.unescape(string_with_entities)

Alternatively, you can do this manually, again using re, as all named entity names and corresponding codepoints are inside the htmlentitydefs module. 或者,您可以再次使用re手动进行此操作,因为所有命名的实体名称和相应的代码点都在htmlentitydefs模块内。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM