简体   繁体   English

哈希标签的正则表达式问题

[英]RegEx issue with hash tag

I am trying to match hash tags and wrap them with an anchor tag.我正在尝试匹配哈希标签并用锚标签包装它们。 Here is the POC:这是 POC:

<p class="display"></p>
var content = "I like #redApple. I have a #black hat. #red is my favorite color";

var re = /(#[a-z0-9][a-z0-9\-_]*)/ig,
    match, matches = [];

while (match = re.exec(content)) {
    matches.push(match[1]);
}

for (i = 0; i < matches.length; i++) {
    value = matches[i];
    console.log(value + ".....value");
    vSearch = value.replace(/[-\/\\^$*+?.=()|[\]{}]/g, '\\$&');
    console.log(vSearch + ".......vSearch");
    regExSearch = new RegExp(vSearch, 'g');
    console.log(regExSearch + "........regExSearch");
    content = content.replace(regExSearch, '<a href="#">' + value + '</a> ');
}

$(".display").append(content);
a {
    color: red;
    text-decoration: underline;
}

I am facing a problem: if the last hash tag word is matching any other word's first characters then its wrapping only that part of the word.我面临一个问题:如果最后一个哈希标签词与任何其他词的第一个字符匹配,那么它只包装该词的那部分。 For this POC, "red" is the last hash tag, that's why first "redApple" becomes "red" only.对于这个 POC,“red”是最后一个哈希标签,这就是为什么第一个“redApple”只变成“red”的原因。 It should wrap the whole word "redApple".它应该包含整个单词“redApple”。

Any help will be appreciated.任何帮助将不胜感激。

["

Final answer<\/i>最终答案<\/b><\/h1>

It appeared that the hashtag regex did not support Unicode letters.<\/i>似乎主题标签正则表达式不支持 Unicode 字母。<\/b> Since \\p{L}<\/code> is not universally adopted in all JavaScript environments, I suggest using the character class that will replace it (taken from XRegExp<\/a> ).<\/i>由于\\p{L}<\/code>并非在所有 JavaScript 环境中普遍采用,我建议使用将替换它的字符类(取自XRegExp<\/a> )。<\/b> The \\b<\/code> word boundary would not work with them as they are not treated as word characters, and thus, we need a (?![^<]*<\/a>)<\/code> look-ahead that will check if the hashtag is already enclosed in <a><\/code> tag.<\/i> \\b<\/code>单词边界不适用于它们,因为它们不被视为单词字符,因此,我们需要一个(?![^<]*<\/a>)<\/code>前瞻来检查主题标签是否已经包含在内在<a><\/code>标签中。<\/b><\/p>

The code looks like:<\/i>代码如下所示:<\/b> <\/p>

<\/p>

 var content = "I like #red-Apple. I have a #black_hat. #red is my favorite color. #speçial. #anötherSpecial #estã, and #ãest. But also remember about #pisklę! Was it #Świętą?! #русский тест."; var re = \/#(?![-_])[-_0-9A-Za-z\\xAA\\xB5\\xBA\\xC0-\\xD6\\xD8-\\xF6\\xF8-\ˁ\ˆ-\ˑ\ˠ-\ˤ\ˬ\ˮ\Ͱ-\ʹ\Ͷ\ͷ\ͺ-\ͽ\Ϳ\Ά\Έ-\Ί\Ό\Ύ-\Ρ\Σ-\ϵ\Ϸ-\ҁ\Ҋ-\ԯ\Ա-\Ֆ\ՙ\ա-\և\א-\ת\װ-\ײ\ؠ-\ي\ٮ\ٯ\ٱ-\ۓ\ە\ۥ\ۦ\ۮ\ۯ\ۺ-\ۼ\ۿ\ܐ\ܒ-\ܯ\ݍ-\ޥ\ޱ\ߊ-\ߪ\ߴ\ߵ\ߺ\ࠀ-\ࠕ\ࠚ\ࠤ\ࠨ\ࡀ-\ࡘ\ࢠ-\ࢲ\ऄ-\ह\ऽ\ॐ\क़-\ॡ\ॱ-\ঀ\অ-\ঌ\এ\ঐ\ও-\ন\প-\র\ল\শ-\হ\ঽ\ৎ\ড়\ঢ়\য়-\ৡ\ৰ\ৱ\ਅ-\ਊ\ਏ\ਐ\ਓ-\ਨ\ਪ-\ਰ\ਲ\ਲ਼\ਵ\ਸ਼\ਸ\ਹ\ਖ਼-\ੜ\ਫ਼\ੲ-\ੴ\અ-\ઍ\એ-\ઑ\ઓ-\ન\પ-\ર\લ\ળ\વ-\હ\ઽ\ૐ\ૠ\ૡ\ଅ-\ଌ\ଏ\ଐ\ଓ-\ନ\ପ-\ର\ଲ\ଳ\ଵ-\ହ\ଽ\ଡ଼\ଢ଼\ୟ-\ୡ\ୱ\ஃ\அ-\ஊ\எ-\ஐ\ஒ-\க\ங\ச\ஜ\ஞ\ட\ண\த\ந-\ப\ம-\ஹ\ௐ\అ-\ఌ\ఎ-\ఐ\ఒ-\న\ప-\హ\ఽ\ౘ\ౙ\ౠ\ౡ\ಅ-\ಌ\ಎ-\ಐ\ಒ-\ನ\ಪ-\ಳ\ವ-\ಹ\ಽ\ೞ\ೠ\ೡ\ೱ\ೲ\അ-\ഌ\എ-\ഐ\ഒ-\ഺ\ഽ\ൎ\ൠ\ൡ\ൺ-\ൿ\අ-\ඖ\ක-\න\ඳ-\ර\ල\ව-\ෆ\ก-\ะ\า\ำ\เ-\ๆ\ກ\ຂ\ຄ\ງ\ຈ\ຊ\ຍ\ດ-\ທ\ນ-\ຟ\ມ-\ຣ\ລ\ວ\ສ\ຫ\ອ-\ະ\າ\ຳ\ຽ\ເ-\ໄ\ໆ\ໜ-\ໟ\ༀ\ཀ-\ཇ\ཉ-\ཬ\ྈ-\ྌ\က-\ဪ\ဿ\ၐ-\ၕ\ၚ-\ၝ\ၡ\ၥ\ၦ\ၮ-\ၰ\ၵ-\ႁ\ႎ\Ⴀ-\Ⴥ\Ⴧ\Ⴭ\ა-\ჺ\ჼ-\ቈ\ቊ-\ቍ\ቐ-\ቖ\ቘ\ቚ-\ቝ\በ-\ኈ\ኊ-\ኍ\ነ-\ኰ\ኲ-\ኵ\ኸ-\ኾ\ዀ\ዂ-\ዅ\ወ-\ዖ\ዘ-\ጐ\ጒ-\ጕ\ጘ-\ፚ\ᎀ-\ᎏ\Ꭰ-\Ᏼ\ᐁ-\ᙬ\ᙯ-\ᙿ\ᚁ-\ᚚ\ᚠ-\ᛪ\ᛱ-\ᛸ\ᜀ-\ᜌ\ᜎ-\ᜑ\ᜠ-\ᜱ\ᝀ-\ᝑ\ᝠ-\ᝬ\ᝮ-\ᝰ\ក-\ឳ\ៗ\ៜ\ᠠ-\ᡷ\ᢀ-\ᢨ\ᢪ\ᢰ-\ᣵ\ᤀ-\ᤞ\ᥐ-\ᥭ\ᥰ-\ᥴ\ᦀ-\ᦫ\ᧁ-\ᧇ\ᨀ-\ᨖ\ᨠ-\ᩔ\ᪧ\ᬅ-\ᬳ\ᭅ-\ᭋ\ᮃ-\ᮠ\ᮮ\ᮯ\ᮺ-\ᯥ\ᰀ-\ᰣ\ᱍ-\ᱏ\ᱚ-\ᱽ\ᳩ-\ᳬ\ᳮ-\ᳱ\ᳵ\ᳶ\ᴀ-\ᶿ\Ḁ-\ἕ\Ἐ-\Ἕ\ἠ-\ὅ\Ὀ-\Ὅ\ὐ-\ὗ\Ὑ\Ὓ\Ὕ\Ὗ-\ώ\ᾀ-\ᾴ\ᾶ-\ᾼ\ι\ῂ-\ῄ\ῆ-\ῌ\ῐ-\ΐ\ῖ-\Ί\ῠ-\Ῥ\ῲ-\ῴ\ῶ-\ῼ\ⁱ\ⁿ\ₐ-\ₜ\ℂ\ℇ\ℊ-\ℓ\ℕ\ℙ-\ℝ\ℤ\Ω\ℨ\K-\ℭ\ℯ-\ℹ\ℼ-\ℿ\ⅅ-\ⅉ\ⅎ\Ↄ\ↄ\Ⰰ-\Ⱞ\ⰰ-\ⱞ\Ⱡ-\ⳤ\Ⳬ-\ⳮ\Ⳳ\ⳳ\ⴀ-\ⴥ\ⴧ\ⴭ\ⴰ-\ⵧ\ⵯ\ⶀ-\ⶖ\ⶠ-\ⶦ\ⶨ-\ⶮ\ⶰ-\ⶶ\ⶸ-\ⶾ\ⷀ-\ⷆ\ⷈ-\ⷎ\ⷐ-\ⷖ\ⷘ-\ⷞ\ⸯ\々\〆\〱-\〵\〻\〼\ぁ-\ゖ\ゝ-\ゟ\ァ-\ヺ\ー-\ヿ\ㄅ-\ㄭ\ㄱ-\ㆎ\ㆠ-\ㆺ\ㇰ-\ㇿ\㐀-\䶵\一-\鿌\ꀀ-\ꒌ\ꓐ-\ꓽ\ꔀ-\ꘌ\ꘐ-\ꘟ\ꘪ\ꘫ\Ꙁ-\ꙮ\ꙿ-\ꚝ\ꚠ-\ꛥ\ꜗ-\ꜟ\Ꜣ-\ꞈ\Ꞌ-\ꞎ\Ꞑ-\Ɬ\Ʞ\Ʇ\ꟷ-\ꠁ\ꠃ-\ꠅ\ꠇ-\ꠊ\ꠌ-\ꠢ\ꡀ-\ꡳ\ꢂ-\ꢳ\ꣲ-\ꣷ\ꣻ\ꤊ-\ꤥ\ꤰ-\ꥆ\ꥠ-\ꥼ\ꦄ-\ꦲ\ꧏ\ꧠ-\ꧤ\ꧦ-\ꧯ\ꧺ-\ꧾ\ꨀ-\ꨨ\ꩀ-\ꩂ\ꩄ-\ꩋ\ꩠ-\ꩶ\ꩺ\ꩾ-\ꪯ\ꪱ\ꪵ\ꪶ\ꪹ-\ꪽ\ꫀ\ꫂ\ꫛ-\ꫝ\ꫠ-\ꫪ\ꫲ-\ꫴ\ꬁ-\ꬆ\ꬉ-\ꬎ\ꬑ-\ꬖ\ꬠ-\ꬦ\ꬨ-\ꬮ\ꬰ-\ꭚ\ꭜ-\ꭟ\ꭤ\ꭥ\ꯀ-\ꯢ\가-\힣\ힰ-\ퟆ\ퟋ-\ퟻ\豈-\舘\並-\龎\ff-\st\ﬓ-\ﬗ\יִ\ײַ-\ﬨ\שׁ-\זּ\טּ-\לּ\מּ\נּ\סּ\ףּ\פּ\צּ-\ﮱ\ﯓ-\ﴽ\ﵐ-\ﶏ\ﶒ-\ﷇ\ﷰ-\ﷻ\ﹰ-\ﹴ\ﹶ-\ﻼ\A-\Z\a-\z\ヲ-\ᄒ\ᅡ-\ᅦ\ᅧ-\ᅬ\ᅭ-\ᅲ\ᅳ-\ᅵ]+(?![^<]*<\\\/a>)\/ig; content = content.replace(re, '<a href="#">$&<\/a>'); $(".display").append(content);<\/code><\/pre>
 a { color: red; text-decoration: underline; }<\/code><\/pre>
 <script src="https:\/\/ajax.googleapis.com\/ajax\/libs\/jquery\/1.8.3\/jquery.min.js"><\/script> <p class="display"><\/p><\/code><\/pre><\/div><\/div>

<\/p>

Here is a modern JavaScript solution based on the ECMAScript 2018+ standard that supports Unicode character (property) classes:<\/i>这是一个基于支持 Unicode 字符(属性)类的 ECMAScript 2018+ 标准的现代 JavaScript 解决方案:<\/b><\/p>

 const content = "I like #red-Apple. I have a #black_hat. #red is my favorite color. #speçial. #anötherSpecial #estã, and #ãest. But also remember about #pisklę! Was it #Świętą?! #русский тест."; const re = \/#(?![-_])[-_\\p{L}0-9]+(?![^<]*<\\\/a>)\/gui; $(".display").append( content.replace(re, '<a href="#">$&<\/a>') );<\/code><\/pre>
 a { color: red; text-decoration: underline; }<\/code><\/pre>
 <script src="https:\/\/ajax.googleapis.com\/ajax\/libs\/jquery\/1.8.3\/jquery.min.js"><\/script> <p class="display"><\/p><\/code><\/pre>

Original answer<\/i>原始答案<\/b><\/h1>

You can use a combination of a non-word boundary and a word boundary<\/a> :<\/i>您可以使用非单词边界和单词边界<\/a>的组合:<\/b><\/p>

 regExSearch = new RegExp("\\\\B" + vSearch + "\\\\b", 'g');<\/code><\/pre>

Here, \\B<\/code> will match at the non-word position (between word characters, or non-word characters, and \\b<\/code> will match at other positions (between a word and non-word character).<\/i>这里, \\B<\/code>将匹配非单词位置(单词字符之间,或非单词字符之间),而\\b<\/code>将匹配其他位置(单词和非单词字符之间)。<\/b><\/p>

Also, I removed the 2 lines below in your code that are redundant:<\/i>此外,我在您的代码中删除了以下两行多余的行:<\/b><\/p>

 vSearch = value.replace(\/[-\\\/\\\\^$*+?.=()|[\\]{}]\/g, '\\\\$&'); console.log(vSearch + ".......vSearch");<\/code><\/pre>

Also, it is worth mentioning that you are using a capturing group around the whole regex pattern (#[a-z0-9][a-z0-9\\-_]*)<\/code> and then creating an array "manually".<\/i>此外,值得一提的是,您在整个正则表达式模式(#[a-z0-9][a-z0-9\\-_]*)<\/code>周围使用了一个捕获组,然后“手动”创建一个数组。<\/b> You do not need it in fact, and can easily get an array with match<\/code> using #[a-z0-9][a-z0-9\\-_]*<\/code> .<\/i>您实际上不需要它,并且可以使用#[a-z0-9][a-z0-9\\-_]*<\/code>轻松获得一个match<\/code>的数组。<\/b><\/p>

The code that is working:<\/i>正在运行的代码:<\/b> <\/p>

<\/p>

 var content = "I like #redApple. I have a #black hat. #red is my favorite color"; var re = \/#[a-z0-9][a-z0-9\\-_]*\/ig; var matches = content.match(re); for (i = 0; i < matches.length; i++) { value = matches[i]; console.log(value + ".....value"); regExSearch = new RegExp("\\\\B" + matches[i] + "\\\\b", 'g'); console.log(regExSearch + "........regExSearch"); content = content.replace(regExSearch, '<a href="#">' + value + '<\/a> '); } $(".display").append(content);<\/code><\/pre>
 a { color: red; text-decoration: underline; }<\/code><\/pre>
 <script src="https:\/\/ajax.googleapis.com\/ajax\/libs\/jquery\/2.1.1\/jquery.min.js"><\/script> <p class="display"><\/p><\/code><\/pre><\/div><\/div>

<\/p>

HTML obtained:<\/i>获得的 HTML:<\/b><\/p>

\"在此处输入图像描述\"<\/p>"]

hi your Problem might solve by using strict check by creating with RegEx by using ' $ '嗨,您的问题可能会通过使用“$”使用 RegEx 创建来解决严格检查

 var re = /(#[a-z0-9][a-z0-9\-_]*)$/ig,

use this RegEx this may will Work.使用这个 RegEx 这可能会起作用。

You can solve it this way (it's much simpler, actually):您可以通过这种方式解决它(实际上要简单得多):

var content = "I like #redApple. I have a #black hat. #red is my favorite color";

var re = /(#[a-z0-9][a-z0-9\-_]*)/ig;

content = content.replace(re, function(x) { return '<a href="#">' + x + '</a> '; });

$(".display").append(content);

vSearch = value.replace(/[-\/\\^$*+?.=()|[\]{}]/g, '\\$&'); in your code is redundant - you don't need it since your matches won't have any of these symbols according to regexp you used to get them.在您的代码中是多余的-您不需要它,因为根据您用来获取它们的正则表达式,您的匹配项将没有任何这些符号。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM