简体   繁体   English

UNIX:使用sed消除和替换字符串中的内容吗?

[英]UNIX: Using sed to eliminate and replace things in a string?

I have a string, lets say: 我有一个字符串,可以说:

<lic><ic>This is a string</ic>, welcome to my blog.</lic>

I want to use sed to get rid of the <ic> and </ic> tags, as well as the literal tags <lic> and </lic> 我想使用sed摆脱<ic></ic>标记以及文字标记<lic></lic>

What is the fastest way to do this? 最快的方法是什么? I'm very new to sed. 我是新来的。 How would this be done in awk? 如何在awk中完成? I know awk is much better for column-like text, so I feel more inclined to learn how to use sed. 我知道awk对于类似列的文本要好得多,因此我更倾向于学习如何使用sed。

Any help is always appreciated, thanks in advance! 总是感谢您的帮助,在此先感谢您!

仅删除标签:

sed -i.old -r 's;</?l?ic>;;g' infile
sed -e 's%</\{0,1\}l\{0,1\}ic>%%g'

The \\{0,1\\} is the standard sed way of writing the equivalent of ? \\{0,1\\}是等同于?的标准sed编写方式? in PCRE. 在PCRE中。 The regex uses % to separate bits; 正则表达式使用%分隔位; then looks for an < possibly followed by a slash, possibly followed by an l , followed by ic> and replaces it with nothing, globally across each line of input. 然后在输入的每一行中全局查找<可能后跟一个斜杠,可能后跟一个l ,然后是ic>并将其替换为空。

Some versions of sed allow you to specify alternative systems of regexes, but this works everywhere. 某些版本的sed允许您指定替代的正则表达式系统,但这在任何地方都有效。

sed doesn't need to be complicated. sed不需要复杂。 Here are two simple ways to do what you want. 这是两种您可以做的简单方法。

This matches those exact patterns and removes them globally: 这会匹配这些确切的模式,并在全局范围内将其删除:

sed -e "s%\\(<lic>\\|</lic>\\|<ic>\\|</ic>\\)%%g" file.txt

Remember, that you can set multiple expressions using sed if necessary: 请记住,必要时可以使用sed设置多个表达式:

sed -e "s%<lic>%%" -e "s%</lic>%%" -e "s%<ic>%%" -e "s%</ic>%%" file.txt

Your tags have a structure of a left bracket followed by a number of characters that are not a right bracket and then finally a right bracket. 您的标签的结构是左括号,后跟一些不是右括号的字符,最后是右括号。 So let's write it that way: 所以让我们这样写:

sed 's/<[^>]*>//g'

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM