简体   繁体   English

使用 sed 删除多个主题标签

[英]Using sed to remove multiple hashtags

I'm writing a bash script on OSX.我正在 OSX 上编写 bash 脚本。 There's a lot of grep and sed going on, all working fine, with one exception: I can't figure out how to remove multiple hashtags.有很多 grep 和 sed 正在进行,一切正常,只有一个例外:我不知道如何删除多个主题标签。

This removes ALL hashtags, no problem:这将删除所有主题标签,没问题:

sed 's/#[^ ]*//g'

I expected this to remove specific hashtags:我希望这会删除特定的主题标签:

sed "s/#(tag1|tag2)//g"

But it doesn't remove anything.但它不会删除任何东西。

I thought the # symbol might be a special character so I tried without:我认为 # 符号可能是一个特殊字符,所以我尝试没有:

sed "s/(tag1|tag2)//g"

It makes no difference, neither tag1 nor tag2 are removed.它没有区别,tag1 和 tag2 都没有被删除。

But if I try:但如果我尝试:

sed "s/tag1//g"

Then tag1 is removed, leaving the #.然后删除 tag1,留下#。

If I then try:如果我再尝试:

sed "s/#tag1//g"

Nothing happens!没发生什么事! It doesn't remove tag1 or #tag1.它不会删除 tag1 或 #tag1。

Could anyone point out where I'm going wrong please?谁能指出我哪里出错了?

EDIT: This is the code:编辑:这是代码:

results=($( \
echo "$ContentsOfHTMLFile" \
| sed -E "s/#(tag1|tag2|tag3)//g" \
| grep -iEo "<p.*>.*$VariableContainingSearchTerms\D.*</p>" \
| grep -iEo "<p.*>.*$VariableContainingSearchTerms.*</p>" \
| grep -Ev $VariableContainingSearchTermsToExclude \
| sed 's/<[^>]*>//g' \
| sed 's/http[^ ]*//g' \
| sed 's/^[[:space:]]*//' \
| sed 's/[[:space:]]*$//' \
))

So what I'm trying to do is:所以我想做的是:

  1. Remove certain hashtags.删除某些主题标签。
  2. Search for <p></p> blocks that contain certain terms.搜索包含特定术语的<p></p>块。
  3. Only keep the blocks that contain certain other terms.只保留包含某些其他术语的块。
  4. Strip all <> blocks.剥离所有<>块。
  5. Strip all URLs.剥离所有 URL。
  6. Strip all leading whitespace.去除所有前导空格。
  7. Strip all trailing whitespace.去除所有尾随空格。

Everything from 2-7 works as it should. 2-7 中的所有内容都可以正常工作。 It's just the hashtags I'm having a problem with.这只是我遇到问题的主题标签。 I've also tried doing the hashtags at other points in the sequence, but it makes no difference.我也试过在序列中的其他点做主题标签,但没有区别。

Try:尝试:

sed -E 's/#(tag1|tag2)//g'

From sed's help:来自 sed 的帮助:

  -E, -r, --regexp-extended
                 use extended regular expressions in the script
                 (for portability use POSIX -E).

POSIX standard sed does not support | POSIX 标准sed不支持| in regex, you can use two s es instead, like this:在正则表达式中,您可以使用两个s代替,如下所示:

sed -e 's/#tag1//g;s/#tag2//g;'

Or或者

sed -e 's/#tag1//g;' -e 's/#tag2//g;'

Btw, it also needs to use \\( and \\) to group things.顺便说一句,它还需要使用\\(\\)来分组。
( and ) will match the parens literally. ()将逐字匹配括号。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM