简体   繁体   English

从字符串中解析 XML 报告“垃圾”在我的文件中,我不知道如何找到它

[英]Parsing XML from string reports “junk” in my file, I don't know how to locate it

I'm trying to parse an XML string with Element Tree.我正在尝试使用元素树解析 XML 字符串。 This string comes from many dict values joined together.该字符串来自许多连接在一起的 dict 值。 There is no root node, but it worked fine the first time.没有根节点,但第一次运行良好。

The first time I did it and it worked:我第一次这样做并且成功了:

   for value in data.values():
        myxml = ' '.join(value)
        tree = ET.fromstring(myxml)

But with the same case, just another dictionary, it doesn't work.但是对于同样的情况,只是另一本字典,它不起作用。 My code to do that is simply:我的代码很简单:

values = [x for x in dict_fasi.values()]
    myxml_fasi = ' '.join(values)
tree2 = ET.fromstring(myxml_fasi)

I also tried with the loop as before and it didn't work.我也像以前一样尝试了循环,但它没有用。 The error says: xml.etree.ElementTree.ParseError: junk after document element: line 8, column 20 .错误说: xml.etree.ElementTree.ParseError: junk after document element: line 8, column 20

Line 8 should be:第 8 行应该是:

</new_line> <new_line>

And the XML string is: XML 字符串是:

<new_line>
          <text font="NUMPTY+ImprintMTnum" bbox="297.284,540.828,300.188,553.310" colourspace="DeviceGray" ncolour="0" size="12.482">della quale non conosce che una parte;] </text>
          <text font="PYNIYO+ImprintMTnum-Italic" bbox="322.455,540.839,328.251,553.566" colourspace="DeviceGray" ncolour="0" size="12.727">prima</text>
          <text font="NUMPTY+ImprintMTnum" bbox="331.206,545.345,334.683,552.834" colourspace="DeviceGray" ncolour="0" size="7.489">1</text>
          <text font="NUMPTY+ImprintMTnum" bbox="177.602,528.028,180.850,540.510" colourspace="DeviceGray" ncolour="0" size="12.482">che nonconosce ancora appieno;</text>
          <text font="NUMPTY+ImprintMTnum" bbox="189.430,532.545,192.908,540.034" colourspace="DeviceGray" ncolour="0" size="7.489">2</text>
          <text font="NUMPTY+ImprintMTnum" bbox="203.879,528.028,208.975,540.510" colourspace="DeviceGray" ncolour="0" size="12.482">che</text>
        </new_line> <new_line>
          <text font="QKWQNQ+ImprintMTnum-Bold" bbox="315.109,462.272,319.863,472.957" colourspace="DeviceGray" ncolour="0" size="10.685">5</text>
          <text font="NUMPTY+ImprintMTnum" bbox="368.916,461.828,372.743,474.310" colourspace="DeviceGray" ncolour="0" size="12.482">avederci]</text>
          <text font="PYNIYO+ImprintMTnum-Italic" bbox="86.577,449.039,92.373,461.766" colourspace="DeviceGray" ncolour="0" size="12.727">sps.a</text>
          <text font="NUMPTY+ImprintMTnum" bbox="167.611,449.028,172.707,461.510" colourspace="DeviceGray" ncolour="0" size="12.482">dove io andava a</text>
          <text font="QKWQNQ+ImprintMTnum-Bold" bbox="68.031,421.672,72.786,432.357" colourspace="DeviceGray" ncolour="0" size="10.685">5</text>
          <text font="NUMPTY+ImprintMTnum" bbox="137.296,421.228,140.200,433.710" colourspace="DeviceGray" ncolour="0" size="12.482">tante libertà] </text>
          <text font="PYNIYO+ImprintMTnum-Italic" bbox="161.868,421.239,167.664,433.966" colourspace="DeviceGray" ncolour="0" size="12.727">prima</text>
          <text font="NUMPTY+ImprintMTnum" bbox="170.784,425.745,174.262,433.234" colourspace="DeviceGray" ncolour="0" size="7.489">1</text>
          <text font="NUMPTY+ImprintMTnum" bbox="174.297,421.228,183.920,433.710" colourspace="DeviceGray" ncolour="0" size="12.482">m</text>
          <text font="MUVAOR+Symbol" bbox="194.367,421.612,199.376,431.672" colourspace="DeviceGray" ncolour="0" size="10.060">&lt;&gt;</text>
          <text font="NUMPTY+ImprintMTnum" bbox="208.349,425.745,211.827,433.234" colourspace="DeviceGray" ncolour="0" size="7.489">2</text>
          <text font="NUMPTY+ImprintMTnum" bbox="244.601,421.228,250.976,433.710" colourspace="DeviceGray" ncolour="0" size="12.482">certe lib</text>
          <text font="MUVAOR+Symbol" bbox="250.901,421.612,255.910,431.672" colourspace="DeviceGray" ncolour="0" size="10.060">&lt;</text>
          <text font="NUMPTY+ImprintMTnum" bbox="269.331,421.228,274.426,433.710" colourspace="DeviceGray" ncolour="0" size="12.482">ertà</text>
          <text font="MUVAOR+Symbol" bbox="274.363,421.612,279.373,431.672" colourspace="DeviceGray" ncolour="0" size="10.060">&gt;</text>
        </new_line> <new_line>

The first XML string the works, instead, is like this:第一个 XML 字符串是这样的:

<new_line>
          <text font="QKWQNQ+ImprintMTnum-Bold" bbox="234.782,118.872,239.536,129.558" colourspace="DeviceGray" ncolour="0" size="10.685">80</text>
          <text font="NUMPTY+ImprintMTnum" bbox="360.280,118.428,363.184,130.911" colourspace="DeviceGray" ncolour="0" size="12.482">pazienza, e la prudenza.] </text>
          <text font="PYNIYO+ImprintMTnum-Italic" bbox="369.339,118.440,375.135,131.167" colourspace="DeviceGray" ncolour="0" size="12.727">da</text>
          <text font="NUMPTY+ImprintMTnum" bbox="113.588,105.629,118.684,118.111" colourspace="DeviceGray" ncolour="0" size="12.482">pa-zienza</text>
          <text font="MUVAOR+Symbol" bbox="120.415,105.707,124.422,117.543" colourspace="DeviceGray" ncolour="0" size="11.835">=</text>
        </new_line>
<new_line>
          <text font="NUMPTY+ImprintMTnum" bbox="194.095,105.629,196.999,118.111" colourspace="DeviceGray" ncolour="0" size="12.482">Cristoforo] </text>
          <text font="PYNIYO+ImprintMTnum-Italic" bbox="214.031,105.640,219.827,118.367" colourspace="DeviceGray" ncolour="0" size="12.727">sts.a</text>
          <text font="NUMPTY+ImprintMTnum" bbox="241.600,81.508,247.396,93.991" colourspace="DeviceGray" ncolour="0" size="12.482">Galdino 72</text>
          <text font="SZWUPJ+ImprintExpertMT" bbox="272.785,614.422,276.490,625.380" colourspace="DeviceGray" ncolour="0" size="10.958">  </text>
          <text font="NUMPTY+ImprintMTnum" bbox="53.923,592.408,58.102,602.646" colourspace="DeviceGray" ncolour="0" size="10.238">34c</text>
          <text font="QKWQNQ+ImprintMTnum-Bold" bbox="72.640,592.472,77.394,603.157" colourspace="DeviceGray" ncolour="0" size="10.685">80</text>
          <text font="NUMPTY+ImprintMTnum" bbox="187.701,592.028,190.605,604.510" colourspace="DeviceGray" ncolour="0" size="12.482">troverà … immaginare] </text>
          <text font="PYNIYO+ImprintMTnum-Italic" bbox="201.265,592.039,204.169,604.766" colourspace="DeviceGray" ncolour="0" size="12.727">da </text>
          <text font="NUMPTY+ImprintMTnum" bbox="305.701,592.028,310.796,604.510" colourspace="DeviceGray" ncolour="0" size="12.482">qualche rimedio inaspe</text>
          <text font="MUVAOR+Symbol" bbox="310.691,592.412,315.701,602.472" colourspace="DeviceGray" ncolour="0" size="10.060">&lt;</text>
          <text font="NUMPTY+ImprintMTnum" bbox="331.518,592.028,337.314,604.510" colourspace="DeviceGray" ncolour="0" size="12.482">ttato</text>
          <text font="MUVAOR+Symbol" bbox="337.154,592.412,342.163,602.472" colourspace="DeviceGray" ncolour="0" size="10.060">&gt;</text>
        </new_line>

Maybe it's a problem of the opening and closing of the new_line tag, but I don't know how to solve it.可能是new_line标签的开闭问题,但是不知道怎么解决。

The term "junk" in the error message seems like a rather unfair value judgement;错误消息中的“垃圾”一词似乎是一种相当不公平的价值判断; but what it means is that the parser expects to see a single top-level element, and when it gets to the end of that element (and any trailing comments or PIs) it expects to see the end of file.但这意味着解析器希望看到单个顶级元素,并且当它到达该元素的末尾(以及任何尾随注释或 PI)时,它希望看到文件的结尾。 If there's another element start tag, then it's not a well-formed XML document.如果有另一个元素开始标记,则它不是格式良好的 XML 文档。

You say you are aware there is no root node, but you seem to be unaware that this makes the document ill-formed.您说您知道没有根节点,但您似乎没有意识到这会使文档格式错误。 You say it worked the first time: well, it shouldn't have worked.你说它第一次起作用:嗯,它不应该起作用。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 web 数据抓取问题我不知道如何从文件中导出信息。html 到我的 python 程序 - web scraping problem with data i don't know how to export information from file.html to my python programme 如果在运行时之前我不知道文件名,如何从文件导入? - how do I import from a file if I don't know the file name until run time? 我不知道如何使我的匹配系统正常工作 - I don't know how to make my matching system work 我不知道如何在我的游戏中显示可供用户选择的课程列表? - I don't know how to show a list of classes for user to choose from in my game? 我不知道如何在我的 web 应用程序中使用 Z319C34606A7D4A9767 - I don't know how to select specific items from jsonnified object in my web application using flask I don't know how to select specific items from json object in my web application using flask - I don't know how to select specific items from json object in my web application using flask Python:我的代码出错,我不知道如何修复它 - Python: Error in my code and i don't know how to fix it 我不知道为什么字符串的长度为&#39;0&#39; - I don't know why the length of string is '0' 我不知道我的数据库名称 - I don't know my Database name 我的gridsearchCV不起作用,我也不知道为什么 - My gridsearchCV don't work and i don't know why
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM