简体   繁体   中英

JavaScript regex, searching for hashtags

How can I search some text for any and all hashtags (alphanumeric AND underscore AND hyphen) and wrap them in span tags eg search

some_string = "this is some text with 3 hashtags #Tag1 and #tag-2 and #tag_3 in it"

and convert it to:

"this is some text with 3 hashtags <span>#Tag1</span> and <span>#tag-2</span> and <span>#tag_3</span> in it"

I've got this so far:

    some_string = some_string.replace(/\(#([a-z0-9\-\_]*)/i,"<span>$1</span>");

but one fault is it doesn't include the # in the wrappings like it should. It seems to output:

"this is some text with 3 hashtags <span>Tag1</span> and #tag-2 and #tag_3 in it "

Also it only detects the first hashtag that it comes across (eg. #Tag1 in this sample), it should detect all.

Also I need the hashtags to be a minimum of 1 character AFTER the #. So # on its own should not match.

Thanks

Try this replace call:

EDIT: if you want to skip http://site.com/#tag kind of strings then use:

var repl = some_string.replace(/(^|\W)(#[a-z\d][\w-]*)/ig, '$1<span>$2</span>');

This is the regular expression you want:

/(#[a-z0-9][a-z0-9\-_]*)/ig

The i makes it case insensitive, which you already had. But the g makes it look through the whole string ("g" stands for "global"). Without the g , the matching stops at the first match.

This also includes a fix to remove the incorrect parenthesis and some unneeded backslashes.

Solution which works in multiline and non-latin symbols:

var getHashTags = function(string) {
   var hashTags, i, len, word, words;
   words = string.split(/[\s\r\n]+/);
   hashTags = [];
   for (i = 0, len = words.length; i < len; i++) {
     word = words[i];
     if (word.indexOf('#') === 0) {
       hashTags.push(word);
     }
   }
   return hashTags;
};

or in CoffeeScript:

getHashTags = (string) ->
  words = string.split /[\s\r\n]+/
  hashTags = []
  hashTags.push word for word in words when word.indexOf('#') is 0
  hashTags

If you don't want to match http://site/#hashs , use this one instead*:

string.replace(/(^|\s)#[a-zA-Z0-9][\w-]*\b/g, "$1<span>$2</span>");

It will match:

  • #word
  • #word_1 and #word-1
  • #word in #word? or #word" or #word. or #word,

It won't match

  • "#word nor ,#word nor .#word
  • /#word
  • #_word nor #-word
  • wor#d

The things you want and don't want to match may vary in different cases.

Try it yourself at regex101 .


* The current accepted answer, posted by @anubhava, claims to skip url hash's but fails doing it.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM