简体   繁体   中英

hashtages between quotation marks

I am working on this little code. I managed to make it work with the double quotation mark on one side but not on the other:

/(?<!\\S)#([0-9\\p{L}]+)+(?=[\\s,!?.\\n][^"]|$)/

Here's what I mean: https://regex101.com/r/yN4tJ6/307

The last " #action should not be converting into a hashtag. How do I add this function to the code above?

This expression seems to work:

(?<!\S)(?<!".)#([0-9\p{L}]+)+(?=[\s,!?.\n][^"]|$)

DEMO

The problem with your current pattern, which almost works, is that you would need a variable width lookbehind to correctly check for the presence/absence of a double quote before each hashtag. The approach I took was to use preg_match_all with a pattern which just consumes enough information to make a decision as to whether a hashtag is a match or not. Consider the following script:

preg_match_all('/(?:^|[^"]\s+)(#[0-9\p{L}]+)[.;,!?]?(?=$|\s+[^"])/', $input, $matches);
print_r($matches[1]);

 Array
(
    [0] => #action
    [1] => #Action
    [2] => #cool
    [3] => #000000
    [4] => #ffffff
)

Here is an explanation of the pattern:

(?:^|[^"]\s+)   match the start of the input, OR
                a single non quote character, followed by one or more whitespaces
(#[0-9\p{L}]+)  then match and capture a hashtag
[.;,!?]?        followed by an optional punctuation character
(?=$|\s+[^"])   finally lookahead and assert either the end of the input, OR
                one or more whitespaces followed by a single non quote character

Note that while we do match some content which we don't really want, that doesn't matter, because the first capture group only contains the hashtag.

My guess is that you might wanted to design an expression similar to:

(?<!"\s)#([0-9\p{L}]+)(?=[\s,!?.\n][^"]|$)

The expression is explained on the top right panel of regex101.com , if you wish to explore/simplify/modify it, and in this link , you can watch how it would match against some sample inputs, if you like.

Test

$re = '/(?<!"\s)#([0-9\p{L}]+)(?=[\s,!?.\n][^"]|$)/m';
$str = 'I enjoy #action movies! #Action
movies are #cool.

Color #000000;  #ffffff; work fine

<div style=" #something "> - works

#action " - works

" #action  - doesn\'t work


';

preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);

var_dump($matches);

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM