正則表達式：匹配下划線包裹的單詞，除非它們以 @ / # 開頭

Question

我正在嘗試通過傳入自定義正則表達式來解決 Tiptap（Vue 的 WYSIWYG 編輯器）中的此錯誤，以便在 Markdown ( _value_ ) 中標識斜體符號的正則表達式不會應用於以@或#開頭的字符串，例如#some_tag_value不會被轉換成#some標簽值。

到目前為止，這是我的正則表達式 - /(^|[^@#_\\w])(?:\\w?)(_([^_]+)_)/g
編輯：在@ Wiktor Stribiżew /(^|[^@#_\\w])(_([^_]+)_)/g幫助下的新正則表達式

雖然它滿足大多數常見情況，但當前在下划線為中間詞時仍然失敗，例如 ant_farm_ 應該匹配（ant farm ）

我還在這里提供了一些“應該匹配”和“不應該匹配”的案例https://regexr.com/50ibf以便於測試

應該匹配（在下划線之間）

_italic text here_
police_woman_
_fire_fighter
a thousand _words_
_brunch_ on a Sunday

不應該匹配

@ta_g_
__value__
#some_tag_value
@some_value_here
@some_tag_
#some_val_
#_hello_

Answer 1

對於科學來說，這個怪物在 Chrome（和 Node.js）中有效。

 let text = ` <strong>Should match</strong> (between underscores) _italic text here_ police_woman_ _fire_fighter a thousand _words_ _brunch_ on a Sunday <strong>Should not match</strong> @ta_g_ __value__ #some_tag_value @some_value_here @some_tag_ #some_val_ #_hello_ `; let re = /(?<=(?:\\s|^)(?![@#])[^_\\n]*)_([^_]+)_/g; document.querySelector('div').innerHTML = text.replace(re, '<em>$1</em>');

 div { white-space: pre; }

 <div/>

這將_something_捕獲為完全匹配，並將something作為第一個捕獲組（以刪除下划線）。 你不能只捕獲something ，因為這樣你就無法分辨下划線里面是什么，外面是什么（試試(?<=(?:\\s|^)(?![@#])[^_\\n]*_)([^_]+)(?=_) )。

有兩件事阻止它普遍適用：

並非所有 JavaScript 引擎都支持后視
大多數正則表達式引擎不支持可變長度的后視

編輯：這有點強，應該允許您另外match_this_and_that_ but not @match_this_and_that正確match_this_and_that_ but not @match_this_and_that ：

/(?<=(?:\s|^)(?![@#])(?!__)\S*)_([^_]+)_/

解釋：

_([^_]+)_    Match non-underscory bit between two underscores
(?<=...)     that is preceded by
(?:\s|^)     either a whitespace or a start of a line/string
             (i.e. a proper word boundary, since we can't use `\b`)
\S*          and then some non-space characters
(?![@#])     that don't start with `@`, `#`,
(?!__)       or `__`.

regex101 演示

Answer 2

您可以使用以下模式：

(?:^|\s)[^@#\s_]*(_([^_]+)_)

查看正則表達式演示

細節

(?:^|\\s) - 字符串或空格的開始
[^@#\\s_]* - 除@ 、 # 、 _和空格之外的 0 個或更多字符
(_([^_]+)_) - 第 1 組： _ ，除_之外的 1+ 個字符（捕獲到第 2 組中）然后是_ 。

Answer 3

這里有一些東西，它不像其他答案那么緊湊，但我認為更容易理解發生了什么。 匹配組\\3是您想要的。

需要多行標志

^([a-zA-Z\s]+|_)(([a-zA-Z\s]+)_)+?[a-zA-Z\s]*?$

^ - 匹配行的開頭
([a-zA-Z\\s]+|_) - 多個單詞或_
(([a-zA-Z\\s]+)_)+? - 多個單詞后跟_至少一次，但最少匹配。
[a-zA-Z\\s]*? - 任何最后的話
$ - 行尾

總而言之，事物的細目匹配其中之一

_<words>_
<words>_<words>_
<words>_<words>_<words>
_<words>_<words>

正則表達式：匹配下划線包裹的單詞，除非它們以 @ / # 開頭

問題描述

3 個解決方案

解決方案1
2 2020-03-18 09:23:37

解決方案2
2 已采納 2020-03-18 09:25:39

解決方案3
0 2020-03-18 09:53:42

正則表達式：匹配下划線包裹的單詞，除非它們以 @ / # 開頭

問題描述

3 個解決方案

解決方案1 2 2020-03-18 09:23:37

解決方案2 2 已采納 2020-03-18 09:25:39

解決方案3 0 2020-03-18 09:53:42

解決方案1
2 2020-03-18 09:23:37

解決方案2
2 已采納 2020-03-18 09:25:39

解決方案3
0 2020-03-18 09:53:42