简体   繁体   中英

Issues when writing an xml file using xml.dom.minidom python

I have an xml file and a python script is used for adding a new node to that xml file.I used xml.dom.minidom module for processing the xml file.My xml file after processing with the python module is given below

<?xml version="1.0" ?><Project DefaultTargets="Build" ToolsVersion="4.0" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
<PostBuildEvent>
  <Command>xcopy &quot;SourceLoc&quot; &quot;DestLoc&quot;</Command>
</PostBuildEvent>
<ImportGroup Label="ExtensionTargets">
</ImportGroup>
<Import Project="project.targets"/></Project>

What i actually needed is as given below .The changes are a newline character after the first line and before the last line and also '&quot' is converted to "

<?xml version="1.0" ?>
<Project DefaultTargets="Build" ToolsVersion="4.0" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
<PostBuildEvent>
  <Command>xcopy "SourceLoc" "DestLoc"</Command>
</PostBuildEvent>
<ImportGroup Label="ExtensionTargets">
</ImportGroup>
<Import Project="project.targets"/>
</Project>

The python code i used is given below

xmltree=xml.dom.minidom.parse(xmlFile)
for Import in Project.getElementsByTagName("Import"):
   newImport = xml.dom.minidom.Element("Import")
   newImport.setAttribute("Project", "project.targets")
vcxprojxmltree.writexml(open(VcxProjFile, 'w'))

What should i update in my code to get the xml in correct format

Thanks,

From docs of minidom:

Node.toprettyxml([indent=""[, newl=""[, encoding=""]]])

Return a pretty-printed version of the document. indent specifies the indentation string and defaults to a tabulator; newl specifies the string emitted at the end of each line and defaults to \n.

That's all customisation you get from minidom.

Tried inserting a Text node as a root sibling for newline. Hope dies last. I recommend using regular expressions from re module and inserting it manually.

As for removing SGML entities, there's apparently an undocumented function for that in python standard library:

import HTMLParser
h = HTMLParser.HTMLParser()
unicode_string = h.unescape(string_with_entities)

Alternatively, you can do this manually, again using re, as all named entity names and corresponding codepoints are inside the htmlentitydefs module.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM