简体   繁体   English

具有相同 URL 的两个不同 XML 命名空间

[英]Two different XML namespaces with the same URL

I am trying to do some data cleaning using the xml element tree library in python.我正在尝试使用 python 中的 xml 元素树库进行一些数据清理。

My xml input files look like this:我的 xml 输入文件如下所示:

<?xml version="1.0" encoding="UTF-8"?>
<mods:mods xmlns:mods="http://www.loc.gov/mods/v3" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.loc.gov/mods/v3" version="3.5" xsi:schemaLocation="http://www.loc.gov/mods/v3 http://www.loc.gov/standards/mods/v3/mods-3-5.xsd">
  <mods:titleInfo>
    <mods:title>1971, Human Events</mods:title>
  </mods:titleInfo>
  <mods:name type="personal" authority="naf" valueURI="https://lccn.loc.gov/n88172648">
    <mods:namePart>Kellems, Vivien, 1896-1975</mods:namePart>
    <mods:role>
      <mods:roleTerm authority="marcrelator" authorityURI="http://id.loc.gov/vocabulary/relators" valueURI="http://id.loc.gov/vocabulary/relators/col" type="text">Collector</mods:roleTerm>
    </mods:role>
  </mods:name>
  <mods:typeOfResource>text</mods:typeOfResource>
  <mods:genre authority="aat" valueURI="300111999">publications (documents)</mods:genre>
  <mods:originInfo>
    <mods:dateIssued encoding="w3cdtf" keyDate="yes">1971</mods:dateIssued>
  </mods:originInfo>
  <mods:physicalDescription>
    <mods:digitalOrigin>reformatted digital</mods:digitalOrigin>
    <mods:internetMediaType>image/jp2</mods:internetMediaType>
  </mods:physicalDescription>
  <mods:note type="ownership">Archives &amp; Special Collections at the Thomas J. Dodd Research Center, University of Connecticut Library</mods:note>
  <mods:identifier type="local">1992-0033/SeriesIII:Activism/SubseriesA:PoliticalCampaigns/Box138:6</mods:identifier>
  <mods:identifier type="local">MSS 1992.0033</mods:identifier>
  <mods:identifier type="local">39153030468468</mods:identifier>
  <mods:accessCondition type="use and reproduction">In Copyright</mods:accessCondition>
  <mods:recordInfo>
    <mods:recordContentSource>University of Connecticut Library</mods:recordContentSource>
    <mods:recordCreationDate encoding="w3cdtf">2018-07-09-04:00</mods:recordCreationDate>
    <mods:languageOfCataloging>
      <mods:languageTerm authority="iso639-2b" type="code">eng</mods:languageTerm>
    </mods:languageOfCataloging>
  </mods:recordInfo>
  <mods:note type="source note">Vivien Kellems Papers</mods:note>
  <mods:note type="source identifier">MSS 1992.0033</mods:note>
  <identifier type="hdl">http://hdl.handle.net/11134/20002:860633493</identifier>
</mods:mods>

All I have to do is change the identifier tag at the end to have the same prefix as the rest of the tags, the "mods" prefix.我所要做的就是更改末尾的标识符标签,使其具有与其余标签相同的前缀,即“mods”前缀。 And to add a specific hlink attribute to the accessCondition tag.并将特定的 hlink 属性添加到 accessCondition 标记。 I have successfully done both of those things.我已经成功地完成了这两件事。 But after I write these modifications back to the file and try to use the xml element tree parser, I get the following error:但是在我将这些修改写回文件并尝试使用 xml 元素树解析器后,我收到以下错误:

xml.etree.ElementTree.ParseError: unbound prefix: line 25, column 2

Now I think this is a namespace issue because the the "xmlns:mods" namespace and the "xmlns" namespace have the same url so when I register the namespace into the parser like so:现在我认为这是一个命名空间问题,因为“xmlns:mods”命名空间和“xmlns”命名空间具有相同的 url,所以当我将命名空间注册到解析器时,如下所示:

ET.register_namespace('', "http://www.loc.gov/mods/v3")
ET.register_namespace('mods', "http://www.loc.gov/mods/v3")
ET.register_namespace('xlink', "http://www.w3.org/1999/xlink")
ET.register_namespace('xsi', "http://www.w3.org/2001/XMLSchema-instance")

It also removes one of the namespaces when I write back to the xml file, the namespace declarations look like this:当我写回 xml 文件时,它还会删除其中一个命名空间,命名空间声明如下所示:

<mods:mods xmlns:mods="http://www.loc.gov/mods/v3" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" version="3.5" xsi:schemaLocation="http://www.loc.gov/mods/v3 http://www.loc.gov/standards/mods/v3/mods-3-5.xsd">

Namely, the "xmlns" declaration.即,“xmlns”声明。 Only the "xmlns:mods" declaration is shown.仅显示“xmlns:mods”声明。 And again I think this is due to them having the same urls.我再次认为这是由于它们具有相同的网址。 Is there anyway to fix this.有没有什么办法解决这一问题。 Any help would be appreciated.任何帮助,将不胜感激。

http://www.loc.gov/mods/v3 is the namespace. http://www.loc.gov/mods/v3是命名空间。 mods is nothing but an abbreviation (aka the "prefix"). mods只不过是一个缩写(又名“前缀”)。 You can have as many different abbreviations for the same namespace in your XML document as you want.您可以根据需要为 XML 文档中的同一名称空间使用任意多个不同的缩写。

For example:例如:

<something xmlns="http://www.loc.gov/mods/v3">
  <mods:something_else xmlns:mods="http://www.loc.gov/mods/v3" />
  <blah:another_thing xmlns:blah="http://www.loc.gov/mods/v3" />
  <last_thing />
</something>

and

<mods:something xmlns:mods="http://www.loc.gov/mods/v3" xmlns:blah="http://www.loc.gov/mods/v3">
  <something_else xmlns="http://www.loc.gov/mods/v3" />
  <mods:another_thing />
  <blah:last_thing />
</mods:something>

and any number of other combinations represent exactly the same document .和任意数量的其他组合代表完全相同的文档

When those are parsed, and then serialized again, all those namespace declarations could be retained exactly as they are, or they could be folded into a single one, the prefixes could be renamed to ns0 , or it could be turned into a default namespace - it does not matter.当它们被解析,然后再次序列化时,所有这些命名空间声明可以完全保持原样,或者它们可以折叠成一个,前缀可以重命名为ns0 ,或者它可以变成默认命名空间 -没关系。 It completely depends on the way the XML library is implemented.这完全取决于 XML 库的实现方式。

As long as every element in the resulting document is in the http://www.loc.gov/mods/v3 namespace, it's the same document by any relevant metric:只要结果文档中的每个元素都在http://www.loc.gov/mods/v3命名空间中,就任何相关指标而言,它都是同一个文档:

<something xmlns="http://www.loc.gov/mods/v3">
  <something_else />
  <another_thing  />
  <last_thing />
</something>

In other words, there is nothing broken, so nothing needs to be fixed.换句话说,没有什么坏掉的,所以没有什么需要修理的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM