简体   繁体   English

替换XML元素的值? sed正则表达式?

[英]Replace an XML element's value? Sed regular expression?

I want to take an XML file and replace an element's value. 我想获取一个XML文件并替换元素的值。 For example if my XML file looks like this: 例如,如果我的XML文件如下所示:

<abc>
    <xyz>original</xyz>
</abc>

I want to replace the xyz element's original value, whatever it may be, with another string so that the resulting file looks like this: 我想用另一个字符串替换xyz元素的原始值,无论它是什么,以便生成的文件如下所示:

<abc>
    <xyz>replacement</xyz>
</abc>

How would you do this? 你会怎么做? I know I could write a Java program to do this but I assume that that's overkill for replacing a single element's value and that this could be easily done using sed to do a substitution using a regular expression. 我知道我可以编写一个Java程序来执行此操作,但我认为替换单个元素的值太过分了,并且可以使用sed使用正则表达式进行替换。 However I'm less than novice with that command and I'm hoping some kind soul reading this will be able to spoon feed me the correct regular expression for the job. 然而,我对这个命令不太新手,我希望有一种灵魂阅读,这将能够为我提供正确的正则表达式。

One idea is to do something like this: 一个想法是做这样的事情:

sed s/\<xyz\>.*\<\\xyz\>/\<xyz\>replacement\<\\xyz\>/ <original.xml >new.xml

Maybe it's better for me to just replace the entire line of the file with what I want it to be, since I will know the element name and the new value I want to use? 也许我最好用我想要的替换文件的整行,因为我会知道我想要使用的元素名称和新值? But this assumes that the element in question is on a single line and that no other XML data is on the same line. 但是这假设所讨论的元素在一行上,并且没有其他XML数据在同一行上。 I'd rather have a command which will basically replace element xyz's value with a new string that I specify and not have to worry if the element is all on one line or not, etc. 我宁愿有一个命令,它基本上会用我指定的新字符串替换元素xyz的值,而不必担心元素是否全部在一行上,等等。

If sed is not the best tool for this job then please dial me in to a better approach. 如果sed不是这项工作的最佳工具,那么请给我一个更好的方法。

If anyone can steer me in the right direction I'll really appreciate it, you'll probably save me hours of trial and error. 如果有人能引导我朝着正确的方向前进,我会非常感激,你可能会节省我数小时的试验和错误。 Thanks in advance! 提前致谢!

--James - 詹姆士

sed is not going to be a easy tool to use for multi-line replacements. sed不会成为用于多行替换的简单工具。 It's possible to implement them using its N command and some recursion, checking after reading in each line if the close of the tag has been found... but it's not pretty and you'll never remember it. 可以使用它的N命令和一些递归来实现它们,如果已经找到了标签的关闭,则在读取每一行后进行检查......但它并不漂亮,你永远不会记住它。

Of course, actually parsing the xml and replacing tags is going to be the safest thing, but if you know you won't run into any problems, you could try this: 当然,实际解析xml和替换标签将是最安全的事情,但如果你知道你不会遇到任何问题,你可以试试这个:

perl -p -0777 -e 's@<xyz>.*?</xyz>@<xyz>new-value</xyz>@sg' <xml-file>

Breaking this down: 打破这个:

  • -p tells it to loop through the input and print -p告诉它循环输入并打印
  • -0777 tells it to use the end of file as the input separator, so that it gets the whole thing in in one slurp -0777告诉它使用文件的结尾作为输入分隔符,以便它在一个-0777得到整个东西
  • -e means here comes the stuff I want you to do -e意味着我要你做的事情

And the substitution itself: 替换本身:

  • use @ as a delimiter so you don't have to escape / 使用@作为分隔符,所以你不必逃避/
  • use *? *? , the non-greedy version, to match as little as possible, so we don't go all the way to the last occurrence of </xyz> in the file ,非贪婪的版本,尽可能少地匹配,所以我们不会一直到文件中最后一次出现的</xyz>
  • use the s modifier to let . 使用s修饰符让. match newlines (to get the multiple-line tag values) 匹配换行符(以获取多行标记值)
  • use the g modifier to match the pattern multiple times 使用g修饰符多次匹配模式

Tada! 田田! This prints the result to stdout - once you verify it does what you want, add the -i option to tell it to edit the file in place. 这会将结果打印到stdout - 一旦您确认它执行了您想要的操作,请添加-i选项以告诉它编辑文件。

OK so I bit the bullet and took the time to write a Java program which does what I want. 好的,所以我咬紧牙关,花时间写一个Java程序,它做了我想要的。 Below is the operative method called by my main() method which does the work, in case this will be helpful to someone else in the future: 下面是我的main()方法调用的操作方法,它可以完成工作,以防将来对其他人有帮助:

/**
 * Takes an input XML file, replaces the text value of the node specified by an XPath parameter, and writes a new
 * XML file with the updated data.
 * 
 * @param inputXmlFilePathName
 * @param outputXmlFilePathName
 * @param elementXpath
 * @param elementValue
 * @param replaceAllFoundElements
 */
public static void replaceElementValue(final String inputXmlFilePathName,
                                       final String outputXmlFilePathName,
                                       final String elementXpathExpression,
                                       final String elementValue,
                                       final boolean replaceAllFoundElements)
{
    try
    {
        // get the template XML as a W3C Document Object Model which we can later write back as a file
        InputSource inputSource = new InputSource(new FileInputStream(inputXmlFilePathName));
        DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance();
        Document document = documentBuilderFactory.newDocumentBuilder().parse(inputSource);

        // create an XPath expression to access the element's node
        XPathFactory xpathFactory = XPathFactory.newInstance();
        XPath xpath = xpathFactory.newXPath();
        XPathExpression xpathExpression = xpath.compile(elementXpathExpression);

        // get the node(s) which corresponds to the XPath expression and replace the value
        Object xpathExpressionResult = xpathExpression.evaluate(document, XPathConstants.NODESET);
        if (xpathExpressionResult == null)
        {
            throw new RuntimeException("Failed to find a node corresponding to the provided XPath.");
        }
        NodeList nodeList = (NodeList) xpathExpressionResult;
        if ((nodeList.getLength() > 1) && !replaceAllFoundElements)
        {
            throw new RuntimeException("Found multiple nodes corresponding to the provided XPath and multiple replacements not specified.");
        }
        for (int i = 0; i < nodeList.getLength(); i++)
        {
            nodeList.item(i).setTextContent(elementValue);
        }

        // prepare the DOM document for writing
        Source source = new DOMSource(document);

        // prepare the output file
        File file = new File(outputXmlFilePathName);
        Result result = new StreamResult(file);

        // write the DOM document to the file
        Transformer transformer = TransformerFactory.newInstance().newTransformer();
        transformer.transform(source, result);
    }
    catch (Exception ex)
    {
        throw new RuntimeException("Failed to replace the element value.", ex);
    }
}

I run the program like so: 我像这样运行程序:

$ java -cp xmlutility.jar com.abc.util.XmlUtility input.xml output.xml '//name/text()' JAMES

I hate to be a naysayer, but XML is anything but regular. 我讨厌成为反对者,但XML不是常规的。 A regular expression will probably be more trouble than what it worth. 正则表达式可能比它的价值更麻烦。 See here for more insight: Using C# Regular expression to replace XML element content 有关更多信息,请参阅此处: 使用C#Regular表达式替换XML元素内容

Your thought of a simple Java program might be nice after all. 毕竟,您对简单Java程序的想法可能会很好。 An XSLT transform may be easier if you know XSLT pretty well. 如果您非常了解XSLT,那么XSLT转换可能会更容易。 If you know Perl ... that's the way to go IMHO. 如果你知道Perl ......那就是去恕我直言的方法。

Having said that, if you choose to go with a Regex and your version of sed supports extended regular expressions, you can make it multiline with /g. 话虽如此,如果您选择使用正则表达式并且您的sed版本支持扩展正则表达式,您可以使用/ g使其成为多行。 In other words, put /g at the end of the regex and it will match your pattern even if they're on multiple lines. 换句话说,将/ g放在正则表达式的末尾,即使它们位于多行上,它也会匹配您的模式。

Also. 也。 the Regex you proposed is "greedy". 你提出的正则表达式是“贪婪的”。 It will grab the biggest group of characters it can because the ". " will match from the first occurrence of to the last . 它将抓取它可以捕获的最大字符组,因为“。 ”将从第一次出现到最后一次匹配。 You can make it "lazy" by changing the wildcard to ". ?". 您可以通过将通配符更改为“。 ?” 来使其“懒惰 ”。 Putting the question mark after the asterisk will tell it to match only one set of to . 在星号后面加上问号会告诉它只匹配一组。

I was trying to do the same thing and came across this [gu]awk script that achieves it. 我试图做同样的事情并遇到了实现它的[gu] awk脚本。

BEGIN { FS = "[<|>]" }
{
    if ($2 == "xyz") {
        sub($3, "replacement")      
    }
    print
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM