[英]RegEx issue with hash tag
I am trying to match hash tags and wrap them with an anchor tag.我正在尝试匹配哈希标签并用锚标签包装它们。 Here is the POC:
这是 POC:
<p class="display"></p>
var content = "I like #redApple. I have a #black hat. #red is my favorite color";
var re = /(#[a-z0-9][a-z0-9\-_]*)/ig,
match, matches = [];
while (match = re.exec(content)) {
matches.push(match[1]);
}
for (i = 0; i < matches.length; i++) {
value = matches[i];
console.log(value + ".....value");
vSearch = value.replace(/[-\/\\^$*+?.=()|[\]{}]/g, '\\$&');
console.log(vSearch + ".......vSearch");
regExSearch = new RegExp(vSearch, 'g');
console.log(regExSearch + "........regExSearch");
content = content.replace(regExSearch, '<a href="#">' + value + '</a> ');
}
$(".display").append(content);
a {
color: red;
text-decoration: underline;
}
I am facing a problem: if the last hash tag word is matching any other word's first characters then its wrapping only that part of the word.我面临一个问题:如果最后一个哈希标签词与任何其他词的第一个字符匹配,那么它只包装该词的那部分。 For this POC, "red" is the last hash tag, that's why first "redApple" becomes "red" only.
对于这个 POC,“red”是最后一个哈希标签,这就是为什么第一个“redApple”只变成“red”的原因。 It should wrap the whole word "redApple".
它应该包含整个单词“redApple”。
Any help will be appreciated.任何帮助将不胜感激。
It appeared that the hashtag regex did not support Unicode letters.<\/i>似乎主题标签正则表达式不支持 Unicode 字母。<\/b> Since \\p{L}<\/code> is not universally adopted in all JavaScript environments, I suggest using the character class that will replace it (taken from XRegExp<\/a> ).<\/i>由于
\\p{L}<\/code>并非在所有 JavaScript 环境中普遍采用,我建议使用将替换它的字符类(取自
XRegExp<\/a> )。<\/b> The
\\b<\/code> word boundary would not work with them as they are not treated as word characters, and thus, we need a
(?![^<]*<\/a>)<\/code> look-ahead that will check if the hashtag is already enclosed in
<a><\/code> tag.<\/i>
\\b<\/code>单词边界不适用于它们,因为它们不被视为单词字符,因此,我们需要一个
(?![^<]*<\/a>)<\/code>前瞻来检查主题标签是否已经包含在内在
<a><\/code>标签中。<\/b><\/p>
The code looks like:<\/i>代码如下所示:<\/b> <\/p>
var content = "I like #red-Apple. I have a #black_hat. #red is my favorite color. #speçial. #anötherSpecial #estã, and #ãest. But also remember about #pisklę! Was it #Świętą?! #русский тест."; var re = \/#(?![-_])[-_0-9A-Za-z\\xAA\\xB5\\xBA\\xC0-\\xD6\\xD8-\\xF6\\xF8-\ˁ\ˆ-\ˑ\ˠ-\ˤ\ˬ\ˮ\Ͱ-\ʹ\Ͷ\ͷ\ͺ-\ͽ\Ϳ\Ά\Έ-\Ί\Ό\Ύ-\Ρ\Σ-\ϵ\Ϸ-\ҁ\Ҋ-\ԯ\Ա-\Ֆ\ՙ\ա-\և\א-\ת\װ-\ײ\ؠ-\ي\ٮ\ٯ\ٱ-\ۓ\ە\ۥ\ۦ\ۮ\ۯ\ۺ-\ۼ\ۿ\ܐ\ܒ-\ܯ\ݍ-\ޥ\ޱ\ߊ-\ߪ\ߴ\ߵ\ߺ\ࠀ-\ࠕ\ࠚ\ࠤ\ࠨ\ࡀ-\ࡘ\ࢠ-\ࢲ\ऄ-\ह\ऽ\ॐ\क़-\ॡ\ॱ-\ঀ\অ-\ঌ\এ\ঐ\ও-\ন\প-\র\ল\শ-\হ\ঽ\ৎ\ড়\ঢ়\য়-\ৡ\ৰ\ৱ\ਅ-\ਊ\ਏ\ਐ\ਓ-\ਨ\ਪ-\ਰ\ਲ\ਲ਼\ਵ\ਸ਼\ਸ\ਹ\ਖ਼-\ੜ\ਫ਼\ੲ-\ੴ\અ-\ઍ\એ-\ઑ\ઓ-\ન\પ-\ર\લ\ળ\વ-\હ\ઽ\ૐ\ૠ\ૡ\ଅ-\ଌ\ଏ\ଐ\ଓ-\ନ\ପ-\ର\ଲ\ଳ\ଵ-\ହ\ଽ\ଡ଼\ଢ଼\ୟ-\ୡ\ୱ\ஃ\அ-\ஊ\எ-\ஐ\ஒ-\க\ங\ச\ஜ\ஞ\ட\ண\த\ந-\ப\ம-\ஹ\ௐ\అ-\ఌ\ఎ-\ఐ\ఒ-\న\ప-\హ\ఽ\ౘ\ౙ\ౠ\ౡ\ಅ-\ಌ\ಎ-\ಐ\ಒ-\ನ\ಪ-\ಳ\ವ-\ಹ\ಽ\ೞ\ೠ\ೡ\ೱ\ೲ\അ-\ഌ\എ-\ഐ\ഒ-\ഺ\ഽ\ൎ\ൠ\ൡ\ൺ-\ൿ\අ-\ඖ\ක-\න\ඳ-\ර\ල\ව-\ෆ\ก-\ะ\า\ำ\เ-\ๆ\ກ\ຂ\ຄ\ງ\ຈ\ຊ\ຍ\ດ-\ທ\ນ-\ຟ\ມ-\ຣ\ລ\ວ\ສ\ຫ\ອ-\ະ\າ\ຳ\ຽ\ເ-\ໄ\ໆ\ໜ-\ໟ\ༀ\ཀ-\ཇ\ཉ-\ཬ\ྈ-\ྌ\က-\ဪ\ဿ\ၐ-\ၕ\ၚ-\ၝ\ၡ\ၥ\ၦ\ၮ-\ၰ\ၵ-\ႁ\ႎ\Ⴀ-\Ⴥ\Ⴧ\Ⴭ\ა-\ჺ\ჼ-\ቈ\ቊ-\ቍ\ቐ-\ቖ\ቘ\ቚ-\ቝ\በ-\ኈ\ኊ-\ኍ\ነ-\ኰ\ኲ-\ኵ\ኸ-\ኾ\ዀ\ዂ-\ዅ\ወ-\ዖ\ዘ-\ጐ\ጒ-\ጕ\ጘ-\ፚ\ᎀ-\ᎏ\Ꭰ-\Ᏼ\ᐁ-\ᙬ\ᙯ-\ᙿ\ᚁ-\ᚚ\ᚠ-\ᛪ\ᛱ-\ᛸ\ᜀ-\ᜌ\ᜎ-\ᜑ\ᜠ-\ᜱ\ᝀ-\ᝑ\ᝠ-\ᝬ\ᝮ-\ᝰ\ក-\ឳ\ៗ\ៜ\ᠠ-\ᡷ\ᢀ-\ᢨ\ᢪ\ᢰ-\ᣵ\ᤀ-\ᤞ\ᥐ-\ᥭ\ᥰ-\ᥴ\ᦀ-\ᦫ\ᧁ-\ᧇ\ᨀ-\ᨖ\ᨠ-\ᩔ\ᪧ\ᬅ-\ᬳ\ᭅ-\ᭋ\ᮃ-\ᮠ\ᮮ\ᮯ\ᮺ-\ᯥ\ᰀ-\ᰣ\ᱍ-\ᱏ\ᱚ-\ᱽ\ᳩ-\ᳬ\ᳮ-\ᳱ\ᳵ\ᳶ\ᴀ-\ᶿ\Ḁ-\ἕ\Ἐ-\Ἕ\ἠ-\ὅ\Ὀ-\Ὅ\ὐ-\ὗ\Ὑ\Ὓ\Ὕ\Ὗ-\ώ\ᾀ-\ᾴ\ᾶ-\ᾼ\ι\ῂ-\ῄ\ῆ-\ῌ\ῐ-\ΐ\ῖ-\Ί\ῠ-\Ῥ\ῲ-\ῴ\ῶ-\ῼ\ⁱ\ⁿ\ₐ-\ₜ\ℂ\ℇ\ℊ-\ℓ\ℕ\ℙ-\ℝ\ℤ\Ω\ℨ\K-\ℭ\ℯ-\ℹ\ℼ-\ℿ\ⅅ-\ⅉ\ⅎ\Ↄ\ↄ\Ⰰ-\Ⱞ\ⰰ-\ⱞ\Ⱡ-\ⳤ\Ⳬ-\ⳮ\Ⳳ\ⳳ\ⴀ-\ⴥ\ⴧ\ⴭ\ⴰ-\ⵧ\ⵯ\ⶀ-\ⶖ\ⶠ-\ⶦ\ⶨ-\ⶮ\ⶰ-\ⶶ\ⶸ-\ⶾ\ⷀ-\ⷆ\ⷈ-\ⷎ\ⷐ-\ⷖ\ⷘ-\ⷞ\ⸯ\々\〆\〱-\〵\〻\〼\ぁ-\ゖ\ゝ-\ゟ\ァ-\ヺ\ー-\ヿ\ㄅ-\ㄭ\ㄱ-\ㆎ\ㆠ-\ㆺ\ㇰ-\ㇿ\㐀-\䶵\一-\鿌\ꀀ-\ꒌ\ꓐ-\ꓽ\ꔀ-\ꘌ\ꘐ-\ꘟ\ꘪ\ꘫ\Ꙁ-\ꙮ\ꙿ-\ꚝ\ꚠ-\ꛥ\ꜗ-\ꜟ\Ꜣ-\ꞈ\Ꞌ-\ꞎ\Ꞑ-\Ɬ\Ʞ\Ʇ\ꟷ-\ꠁ\ꠃ-\ꠅ\ꠇ-\ꠊ\ꠌ-\ꠢ\ꡀ-\ꡳ\ꢂ-\ꢳ\ꣲ-\ꣷ\ꣻ\ꤊ-\ꤥ\ꤰ-\ꥆ\ꥠ-\ꥼ\ꦄ-\ꦲ\ꧏ\ꧠ-\ꧤ\ꧦ-\ꧯ\ꧺ-\ꧾ\ꨀ-\ꨨ\ꩀ-\ꩂ\ꩄ-\ꩋ\ꩠ-\ꩶ\ꩺ\ꩾ-\ꪯ\ꪱ\ꪵ\ꪶ\ꪹ-\ꪽ\ꫀ\ꫂ\ꫛ-\ꫝ\ꫠ-\ꫪ\ꫲ-\ꫴ\ꬁ-\ꬆ\ꬉ-\ꬎ\ꬑ-\ꬖ\ꬠ-\ꬦ\ꬨ-\ꬮ\ꬰ-\ꭚ\ꭜ-\ꭟ\ꭤ\ꭥ\ꯀ-\ꯢ\가-\힣\ힰ-\ퟆ\ퟋ-\ퟻ\豈-\舘\並-\龎\ff-\st\ﬓ-\ﬗ\יִ\ײַ-\ﬨ\שׁ-\זּ\טּ-\לּ\מּ\נּ\סּ\ףּ\פּ\צּ-\ﮱ\ﯓ-\ﴽ\ﵐ-\ﶏ\ﶒ-\ﷇ\ﷰ-\ﷻ\ﹰ-\ﹴ\ﹶ-\ﻼ\A-\Z\a-\z\ヲ-\ᄒ\ᅡ-\ᅦ\ᅧ-\ᅬ\ᅭ-\ᅲ\ᅳ-\ᅵ]+(?![^<]*<\\\/a>)\/ig; content = content.replace(re, '<a href="#">$&<\/a>'); $(".display").append(content);<\/code><\/pre>
a { color: red; text-decoration: underline; }<\/code><\/pre>
<script src="https:\/\/ajax.googleapis.com\/ajax\/libs\/jquery\/1.8.3\/jquery.min.js"><\/script> <p class="display"><\/p><\/code><\/pre><\/div><\/div>
<\/p>
Here is a modern JavaScript solution based on the ECMAScript 2018+ standard that supports Unicode character (property) classes:<\/i>
这是一个基于支持 Unicode 字符(属性)类的 ECMAScript 2018+ 标准的现代 JavaScript 解决方案:<\/b><\/p>
![]()
const content = "I like #red-Apple. I have a #black_hat. #red is my favorite color. #speçial. #anötherSpecial #estã, and #ãest. But also remember about #pisklę! Was it #Świętą?! #русский тест."; const re = \/#(?![-_])[-_\\p{L}0-9]+(?![^<]*<\\\/a>)\/gui; $(".display").append( content.replace(re, '<a href="#">$&<\/a>') );<\/code><\/pre>
a { color: red; text-decoration: underline; }<\/code><\/pre>
<script src="https:\/\/ajax.googleapis.com\/ajax\/libs\/jquery\/1.8.3\/jquery.min.js"><\/script> <p class="display"><\/p><\/code><\/pre>
Original answer<\/i>原始答案<\/b><\/h1>
You can use a combination of a non-word boundary and a word boundary<\/a> :<\/i>您可以使用
非单词边界和单词边界<\/a>的组合:<\/b><\/p>
![]()
regExSearch = new RegExp("\\\\B" + vSearch + "\\\\b", 'g');<\/code><\/pre>
Here,
\\B<\/code> will match at the non-word position (between word characters, or non-word characters, and
\\b<\/code> will match at other positions (between a word and non-word character).<\/i>这里,
\\B<\/code>将匹配非单词位置(单词字符之间,或非单词字符之间),而
\\b<\/code>将匹配其他位置(单词和非单词字符之间)。<\/b><\/p>
Also, I removed the 2 lines below in your code that are redundant:<\/i>此外,我在您的代码中删除了以下两行多余的行:<\/b><\/p>
vSearch = value.replace(\/[-\\\/\\\\^$*+?.=()|[\\]{}]\/g, '\\\\$&'); console.log(vSearch + ".......vSearch");<\/code><\/pre>
Also, it is worth mentioning that you are using a capturing group around the whole regex pattern
(#[a-z0-9][a-z0-9\\-_]*)<\/code> and then creating an array "manually".<\/i>此外,值得一提的是,您在整个正则表达式模式
(#[a-z0-9][a-z0-9\\-_]*)<\/code>周围使用了一个捕获组,然后“手动”创建一个数组。<\/b> You do not need it in fact, and can easily get an array with
match<\/code> using
#[a-z0-9][a-z0-9\\-_]*<\/code> .<\/i>您实际上不需要它,并且可以使用
#[a-z0-9][a-z0-9\\-_]*<\/code>轻松获得一个
match<\/code>的数组。<\/b><\/p>
The code that is working:<\/i>正在运行的代码:<\/b> <\/p>
<\/p>
var content = "I like #redApple. I have a #black hat. #red is my favorite color"; var re = \/#[a-z0-9][a-z0-9\\-_]*\/ig; var matches = content.match(re); for (i = 0; i < matches.length; i++) { value = matches[i]; console.log(value + ".....value"); regExSearch = new RegExp("\\\\B" + matches[i] + "\\\\b", 'g'); console.log(regExSearch + "........regExSearch"); content = content.replace(regExSearch, '<a href="#">' + value + '<\/a> '); } $(".display").append(content);<\/code><\/pre>
a { color: red; text-decoration: underline; }<\/code><\/pre>
<script src="https:\/\/ajax.googleapis.com\/ajax\/libs\/jquery\/2.1.1\/jquery.min.js"><\/script> <p class="display"><\/p><\/code><\/pre><\/div><\/div>
<\/p>
HTML obtained:<\/i>获得的 HTML:<\/b><\/p>
<\/p>"]
hi your Problem might solve by using strict check by creating with RegEx by using ' $ '嗨,您的问题可能会通过使用“$”使用 RegEx 创建来解决严格检查
var re = /(#[a-z0-9][a-z0-9\-_]*)$/ig,
use this RegEx this may will Work.使用这个 RegEx 这可能会起作用。
You can solve it this way (it's much simpler, actually):您可以通过这种方式解决它(实际上要简单得多):
var content = "I like #redApple. I have a #black hat. #red is my favorite color";
var re = /(#[a-z0-9][a-z0-9\-_]*)/ig;
content = content.replace(re, function(x) { return '<a href="#">' + x + '</a> '; });
$(".display").append(content);
vSearch = value.replace(/[-\/\\^$*+?.=()|[\]{}]/g, '\\$&');
in your code is redundant - you don't need it since your matches won't have any of these symbols according to regexp you used to get them.在您的代码中是多余的-您不需要它,因为根据您用来获取它们的正则表达式,您的匹配项将没有任何这些符号。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.