简体   繁体   English

如何搜索并用sed替换此字符串?

[英]How to search and replace this string with sed?

I'm desperately trying to search the following: 我拼命尝试搜索以下内容:

<texit info> author=MySelf title=MyTitle </texit>

and replace it with blank. 并将其替换为空白。

What I've tried so far is the following: 到目前为止,我已经尝试了以下方法:

sed –I '1,5s/<texit//;s/info>//;s/author=MySelf//;s/title=MyTitle//' test.txt

But it doesn't work. 但这是行不通的。

Don't edit XML with sed -- the right tool would be something like XMLStarlet , with a line like the following: 不要使用sed编辑XML-正确的工具应该是XMLStarlet之类的东西,并带有如下一行:

xmlstarlet ed -u //texit[@info] -v 'author=NewAuthor title=NewTitle'

...if your goal were to update the text within the tag. ...如果您的目标是更新代码中的文本。

Regular expressions are not expressive enough to correctly handle XML (even formally -- regular expressions are theoretically sufficient to parse regular languages; XML is not one). 正则表达式的表达能力不足以正确处理XML(即使在形式上,正则表达式理论上也足以解析正则语言; XML并不是一种)。 For instance, your original would be just as valid written with newlines, as: 例如,您的原件与换行符一样有效,例如:

< texit
  info >author=MySelf title=MyTitle</texit>

...and writing a sed command to handle that case would not be fun. ...并编写一个sed命令来处理这种情况并不是一件好事。 XML-native tools, on the other hand, can correctly handle all of XML's corner cases. 另一方面,XML原生工具可以正确处理XML的所有极端情况。

That said, the sed expression you gave does indeed "work", inasmuch as it does exactly what it's written to do. 就是说,您给出的sed表达式确实可以“工作”,因为它确实可以完成编写的工作。

sed -e '1,5s/<texit//;s/info>//;s/author=MySelf//;s/title=MyTitle//' \
  <<<"<texit info>author=MySelf title=MyTitle foo bar</texit>"

returns the output 返回输出

   foo bar</texit>

which is exactly what it should do, as it's removing the <texit string, the info> string, the author=MySelf , title=MyTitle , but leaving the closing </texit> and any excess text, just as you asked. 这正是它应该做的,因为它删除了<texit字符串, info>字符串, author=MySelftitle=MyTitle ,但保留了结束</texit>和任何多余的文本,正如您所要求的。 If you expect or desire it to do something different, you should explain what that is. 如果您期望或希望它做一些不同的事情,则应解释其含义。

sed 's/<texit\s\+info>\s*author=MySelf\s\+title=MyTitle\s*<\/texit>//g' test.txt

You should generally not edit XML with a regex, but if you only want to strip these tags, the above will work. 通常,您不应该使用正则表达式来编辑XML,但是,如果您只想剥离这些标签,那么上面的方法就可以工作。 You don't need multiple s commands, just use a single pattern with correctly defined whitespace. 您不需要多个s命令,只需使用具有正确定义的空格的单个模式即可。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM