使用 sed 删除多个主题标签

Question

I'm writing a bash script on OSX.我正在 OSX 上编写 bash 脚本。 There's a lot of grep and sed going on, all working fine, with one exception: I can't figure out how to remove multiple hashtags.有很多 grep 和 sed 正在进行，一切正常，只有一个例外：我不知道如何删除多个主题标签。

This removes ALL hashtags, no problem:这将删除所有主题标签，没问题：

sed 's/#[^ ]*//g'

I expected this to remove specific hashtags:我希望这会删除特定的主题标签：

sed "s/#(tag1|tag2)//g"

But it doesn't remove anything.但它不会删除任何东西。

I thought the # symbol might be a special character so I tried without:我认为 # 符号可能是一个特殊字符，所以我尝试没有：

sed "s/(tag1|tag2)//g"

It makes no difference, neither tag1 nor tag2 are removed.它没有区别，tag1 和 tag2 都没有被删除。

But if I try:但如果我尝试：

sed "s/tag1//g"

Then tag1 is removed, leaving the #.然后删除 tag1，留下#。

If I then try:如果我再尝试：

sed "s/#tag1//g"

Nothing happens!没发生什么事！ It doesn't remove tag1 or #tag1.它不会删除 tag1 或 #tag1。

Could anyone point out where I'm going wrong please?谁能指出我哪里出错了？

EDIT: This is the code:编辑：这是代码：

results=($( \
echo "$ContentsOfHTMLFile" \
| sed -E "s/#(tag1|tag2|tag3)//g" \
| grep -iEo "<p.*>.*$VariableContainingSearchTerms\D.*</p>" \
| grep -iEo "<p.*>.*$VariableContainingSearchTerms.*</p>" \
| grep -Ev $VariableContainingSearchTermsToExclude \
| sed 's/<[^>]*>//g' \
| sed 's/http[^ ]*//g' \
| sed 's/^[[:space:]]*//' \
| sed 's/[[:space:]]*$//' \
))

So what I'm trying to do is:所以我想做的是：

Remove certain hashtags.删除某些主题标签。
Search for <p></p> blocks that contain certain terms.搜索包含特定术语的<p></p>块。
Only keep the blocks that contain certain other terms.只保留包含某些其他术语的块。
Strip all <> blocks.剥离所有<>块。
Strip all URLs.剥离所有 URL。
Strip all leading whitespace.去除所有前导空格。
Strip all trailing whitespace.去除所有尾随空格。

Everything from 2-7 works as it should. 2-7 中的所有内容都可以正常工作。 It's just the hashtags I'm having a problem with.这只是我遇到问题的主题标签。 I've also tried doing the hashtags at other points in the sequence, but it makes no difference.我也试过在序列中的其他点做主题标签，但没有区别。

Answer 1

Try:尝试：

sed -E 's/#(tag1|tag2)//g'

From sed's help:来自 sed 的帮助：

  -E, -r, --regexp-extended
                 use extended regular expressions in the script
                 (for portability use POSIX -E).

Answer 2

POSIX standard sed does not support | POSIX 标准sed不支持| in regex, you can use two s es instead, like this:在正则表达式中，您可以使用两个s代替，如下所示：

sed -e 's/#tag1//g;s/#tag2//g;'

Or或者

sed -e 's/#tag1//g;' -e 's/#tag2//g;'

Btw, it also needs to use \\( and \\) to group things.顺便说一句，它还需要使用\\(和\\)来分组。
( and ) will match the parens literally. (和)将逐字匹配括号。

使用 sed 删除多个主题标签

问题描述

2 个解决方案

解决方案1
1 2019-03-20 16:36:56

解决方案2
1 2019-03-20 16:46:19

使用 sed 删除多个主题标签

问题描述

2 个解决方案

解决方案1 1 2019-03-20 16:36:56

解决方案2 1 2019-03-20 16:46:19

解决方案1
1 2019-03-20 16:36:56

解决方案2
1 2019-03-20 16:46:19