简体   繁体   English

使用XPath替换XML文件中的文本,同时保留格式

[英]Replace text in an XML file using XPath while preserving formatting

I would like to replace text in an XML file, but preserve any other formatting in the source file. 我想替换XML文件中的文本,但保留源文件中的任何其他格式。

Eg parsing it as DOM, replacing the node using XPath and output as String might not do the trick as it will reformat the entire file. 例如,将其解析为DOM,使用XPath替换节点并输出为String可能无法解决问题,因为它将重新格式化整个文件。 (pretty printing might be good for 99% of the cases, but the requirement is to preserve existing formatting, even if it's not "pretty") (漂亮的打印可能适用于99%的情况,但要求是保留现有格式,即使它不是“漂亮”)

Is there any Java / Scala library that can do a "find and replace" on a String, without parsing it as a DOM tree? 是否有任何Java / Scala库可以对String进行“查找和替换”,而无需将其解析为DOM树? or at least be able to preserve the original formatting? 或者至少能够保留原始格式?

EDIT: 编辑:

I think that the maven replacer plugin does something like this , it seems that it preserves original whitespace formatting by using setPreserveSpace (I think, need to try) 我认为maven replacer插件了这样的事情 ,它似乎通过使用setPreserveSpace保留了原始的空白格式(我想,需要尝试)

import org.apache.xml.serialize.OutputFormat;
import org.apache.xml.serialize.XMLSerializer; 
...
   private String writeXml(Document doc) throws Exception {
            OutputFormat of = new OutputFormat(doc);
            of.setPreserveSpace(true);
            of.setEncoding(doc.getXmlEncoding());

            StringWriter sw = new StringWriter();
            XMLSerializer serializer = new XMLSerializer(sw, of);
            serializer.serialize(doc);
            return sw.toString();
    }

So the question changes to: Is there a (straight forward) way to do so without extra dependencies? 所以问题就变成:有没有(直接的)方法没有额外的依赖关系?

EDIT2: EDIT2:

The requirement is to use an XPath query provided externally, ie as a String. 要求是使用外部提供的XPath查询,即作为String。

You can try scala.xml.pull or Scales XML. 您可以尝试使用scala.xml.pull或Scales XML。

You can find working code for parsing files here . 您可以在此处找到用于解析文件的工作代码。

Scales XML can use the STAX API, which is a streaming API. Scales XML可以使用STAX API,它是一个流API。 So there is never a full DOM and usually the parts of the XML are reached through without too much pre-processing. 所以从来没有一个完整的DOM,通常没有太多的预处理就可以实现XML的各个部分。

Test it with your special formatted XML file and look if it works out. 使用特殊格式的XML文件对其进行测试,看看它是否有效。

I would not recommend to use simple text search and replace with XML. 我不建议使用简单的文本搜索并替换为XML。 There is a good chance of a mismatch. 很有可能出现不匹配的情况。 You will then alter the document in a unpredictable way. 然后,您将以不可预测的方式更改文档。 The resulting bugs are usually hard to find. 由此产生的错误通常很难找到。

I have made a short experiment with Scales XML and it looks quite promising: 我用Scales XML进行了一个简短的实验,看起来很有希望:

    scala> import scales.utils._
    import scales.utils._
    scala> import ScalesUtils._
    import ScalesUtils._
    scala> import scales.xml._
    import scales.xml._
    scala> import ScalesXml._
    import ScalesXml._
    scala> import scales.xml.serializers.StreamSerializer
    import scales.xml.serializers.StreamSerializer
    scala> import java.io.StringReader
    import java.io.StringReader
    scala> import java.io.PrintWriter
    import java.io.PrintWriter

    scala> def xmlsrc=new StringReader("""
         | <a attr1="value1"> <b/>This
         | is some tex<xt/>
         |   <!-- A comment -->
         |   <c><d>
         |   </d>
         |   <removeme/>
         |   <changeme/>
         | </c>
         | </a>
         | """)
    xmlsrc: java.io.StringReader

    scala> def pull=pullXml(xmlsrc)
    pull: scales.xml.XmlPull with java.io.Closeable with scales.utils.IsClosed

    scala> writeTo(pull, new PrintWriter(System.out))
    <?xml version="1.0" encoding="UTF-8"?><a attr1="value1"> <b/>This
    is some tex<xt/>
      <!-- A comment -->
      <c><d>
      </d>
      <removeme/>
      <changeme/>
    </c>
    res0: Option[Throwable] = None

    scala> def filtered=pull flatMap {
         |   case Left(e : Elem) if e.name.local == "removeme" => Nil
         |   case Right(e : EndElem) if e.name.local == "removeme" => Nil
         |   case Left(e : Elem) if e.name.local == "changeme" => List(Left(Elem("x")), Left(Elem("y"
     Right(EndElem("x")))
         |   case Right(e : EndElem) if e.name.local == "changeme" => List(Right(EndElem("x")))
         |   case otherwise => List(otherwise)
         | }
    filtered: Iterator[scales.xml.PullType]

    scala> writeTo(filtered, new PrintWriter(System.out))
    <?xml version="1.0" encoding="UTF-8"?><a attr1="value1"> <b/>This
    is some tex<xt/>
      <!-- A comment -->
      <c><d>
      </d>

      <x><y/></x>
    </c>
    res1: Option[Throwable] = None

The example first initializes the XML token stream. 该示例首先初始化XML令牌流。 Then it prints the token stream unmodified. 然后它打印未经修改的令牌流。 You can see, that comments and formatting are preserved. 您可以看到,保留了注释和格式。 Then it modifies the token stream with the monadic Scala API and prints the result. 然后,它使用monadic Sc​​ala API修改令牌流并打印结果。 You can see that most formatting is preserved and only the formatting of the changed parts differs. 您可以看到大多数格式都被保留,只有更改的部分的格式不同。

So it looks like Scales XML solves your problem in a straight forward way. 所以看起来Scales XML可以直接解决您的问题。

I was going to code up something quick to recall scala.xml and how much I dislike it; 我打算快速编写一些代码来调用scala.xml,以及我有多喜欢它。 I haven't used it since I first learned some Scala. 自从我第一次学习Scala之后,我就没用过它。

You normally see text nodes of white space -- this is mentioned in PiS, in the "catalog" example here . 您通常会看到白色空间的文本节点 - 这在PiS 提到, 在此处的“目录”示例中

I did remember that it reverses attributes on load -- I vaguely remembered having to fix pretty printing . 我确实记得它在加载时反转属性 - 我依旧记得必须修复漂亮的打印

But the compiler doesn't reverse attributes on xml literals. 但编译器不会在xml文字上反转属性。 So given that you want to supply an xpath dynamically, you could use the compiler toolbox to compile the source document as a literal and also compile the xpath string, with / operators converted to \\ . 因此,如果要动态提供xpath,可以使用编译器工具箱将源文档编译为文字,并编译xpath字符串,并将/运算符转换为\\

That's just a little out-of-the-box fun, but maybe it has a sweet spot of applicability, perhaps if you must use only the standard Scala distro. 这只是一个开箱即用的乐趣,但也许它有一个适用性的最佳点,也许你必须只使用标准的Scala发行版。

I'll update later when I get a chance to try it out. 我稍后会在有机会尝试时更新。

import scala.xml._
import java.io.File

object Test extends App {
  val src =
"""|<doc>
   |  <foo bar="red" baz="yellow"> <bar> red </bar> </foo>
   |  <baz><bar>red</bar></baz>
   |</doc>""".stripMargin

  val red = "(.*)red(.*)".r
  val sub = "blue"

val tmp =
<doc>
   <foo bar="red" baz="yellow"> <bar> red </bar> </foo>
   <baz><bar>red</bar></baz>
</doc>

  Console println tmp

  // replace "red" with "blue" in all bar text

  val root = XML loadString src
  Console println root
  val bars = root \\ "bar"
  val barbars =
    bars map (_ match {
      case <bar>{Text(red(prefix, suffix))}</bar> =>
           <bar>{Text(s"$prefix$sub$suffix")}</bar>
      case b => b
    })
  val m = (bars zip barbars).toMap
  val sb = serialize(root, m)
  Console println sb

  def serialize(x: Node, m: Map[Node, Node], sb: StringBuilder = new StringBuilder) = {
    def serialize0(x: Node): Unit = x match {
      case e0: Elem =>
        val e = if (m contains e0) m(e0) else e0
        sb append "<"
        e nameToString sb
        if (e.attributes ne null) e.attributes buildString sb
        if (e.child.isEmpty) sb append "/>"
        else {
          sb append ">"
          for (c <- e.child) serialize0(c)
          sb append "</"
          e nameToString sb
          sb append ">"
        }
      case Text(t) => sb append t
    }
    serialize0(x)
    sb
  }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM