简体   繁体   English

向特定的xml标签添加显式命名空间属性

[英]Adding explicit namespace attribute to particular xml tags

In a quick-and-dirty conversion from XML generated by one program (htlatex) to another (ArborText Editor), I need to replace all XML of the following form 从一个程序(htlatex)生成的XML到另一个(ArborText Editor)的快速转换中,我需要替换以下形式的所有XML

<math xmlns="http://www.w3.org/1998/Math/MathML">
<mn>
....
</mn>
</math>

with

<m:math xmlns:m="http://www.w3.org/1998/Math/MathML">
<m:mn xmlns:m="http://www.w3.org/1998/Math/MathML">
....
</m:mn>
</m:math>

Is there a cleaner easy way to achieve this rather than searching for tags <math , <mn> etc. and replacing them? 是否有一种更简单的方法来实现此目的,而不是搜索标签<math<mn>等并替换它们? Or can this be done via xslt? 还是可以通过xslt完成?

Disclaimer : I am going to use regex to this! 免责声明 :我将为此使用正则表达式! If you are afraid They might get you, stop reading now. 如果您担心他们会抓住您,请立即停止阅读。


Because this is a very limited kind of problem, I believe using an XML parser to find the things we want to change and then using regex is ok here. 因为这是一种非常有限的问题,所以我相信在这里使用XML解析器查找我们要更改的内容,然后使用正则表达式是可以的。 We are not trying to parse anything with the regex, just replacing simple text patterns. 我们不尝试使用正则表达式解析任何内容,而只是替换简单的文本模式。

I used XML::Twig to find all the math nodes, grabber their XML as a string, replaced the namespaces and put the XML back, which makes XML::Twig parse the altered string. 我使用XML :: Twig查找所有math节点,将它们的XML捕获为字符串,替换了名称空间,然后放回XML,这使XML :: Twig解析了更改后的字符串。 If the regex manipulation broke something to the extend that it's invalid, we will notice here because the parsing will fail. 如果正则表达式操作破坏了无效的范围,我们将在这里注意到,因为解析将失败。

Of course this asumes there are no other namespaces inside the math elements. 当然,这假定math元素内没有其他名称空间。

use strict;
use warnings;
use XML::Twig;

my $xml = <<XML;
<container>
<math xmlns="http://www.w3.org/1998/Math/MathML">
<mn>
<foo>asdf</foo>
<bar>fdsa</bar>
</mn>
</math>
</container>
XML

my $t = XML::Twig->new(
    pretty_print  => 'indented',
    twig_handlers => {
        math => sub {
            my $new_xml = $_->outer_xml;
            $new_xml =~ s{ xmlns="http://www.w3.org/1998/Math/MathML"}{};
            $new_xml =~ s{<([a-zA-Z]+)}{<m:$1 xmlns:m="http://www.w3.org/1998/Math/MathML"}g;
            $new_xml =~ s{</}{</m:}g;

            $_->set_outer_xml($new_xml);
        },
    }
);
$t->parse($xml);
$t->print;

Output contains the namespace in each element starting from math . 输出包含从math开始的每个元素中的名称空间。

<container>
  <m:math xmlns:m="http://www.w3.org/1998/Math/MathML">
    <m:mn xmlns:m="http://www.w3.org/1998/Math/MathML">
      <m:foo xmlns:m="http://www.w3.org/1998/Math/MathML">asdf</m:foo>
      <m:bar xmlns:m="http://www.w3.org/1998/Math/MathML">fdsa</m:bar>
    </m:mn>
  </m:math>
</container>

I verified that it works for more deeply nested structures and multiple math elements. 我验证了它适用于更深层的嵌套结构和多个math元素。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM