![](/img/trans.png)
[英]Finding multiple Punctuation marks occurring in a series in a given string using Javascript
[英]How to parse string into words and punctuation marks using javascript
我有一個字符串test =“ hello,您好嗎,我希望一切都好!很好。期待與您見面。
我正在嘗試使用javascript將字符串解析為單詞和標點符號。 我可以分隔單詞,但使用正則表達式后標點符號消失了
var result = test.match(/ \\ b(\\ w |')+ \\ b / g);
所以我的預期輸出是
hello
how
are
you
all
doing
,
I
hope
that
it's
good
!
and
fine
.
Looking
forward
to
see
you
如果您使用第一種方法,則與javascript的“單詞”定義相匹配。 下面是一種更可定制的方法。
嘗試test.split(/\\s*\\b\\s*/)
。 它在單詞邊界( \\b
)上分割並占用空白。
"hello how are you all doing, I hope that it's good! and fine. Looking forward to see you."
.split(/\s*\b\s*/);
// Returns:
["hello",
"how",
"are",
"you",
"all",
"doing",
",",
"I",
"hope",
"that",
"it",
"'",
"s",
"good",
"!",
"and",
"fine",
".",
"Looking",
"forward",
"to",
"see",
"you",
"."]
var test = "This is. A test?"; // Test string.
// First consider splitting on word boundaries (\b).
test.split(/\b/); //=> ["This"," ","is",". ","A"," ","test","?"]
// This almost works but there is some unwanted whitespace.
// So we change the split regex to gobble the whitespace using \s*
test.split(/\s*\b\s*/) //=> ["This","is",".","A","test","?"]
// Now the whitespace is included in the separator
// and not included in the result.
如果您希望將“ isn`t”和“一千”之類的單詞視為一個單詞,而javascript regex認為它們是兩個單詞,則需要創建自己的單詞定義。
test.match(/[\w-']+|[^\w\s]+/g) //=> ["This","is",".","A","test","?"]
這與實際的單詞和標點符號使用交替交替匹配。 正則表達式[\\w-']+
的前半部分匹配您認為是單詞的所有內容,后半部分[^\\w\\s]+
與您認為標點符號的所有內容匹配。 在此示例中,我僅使用了不是單詞或空格的內容。 我也要在最后加上一個+
,以便將多個字符的標點符號(例如?!正確地寫成⃓)視為一個字符,如果您不希望刪除+
。
用這個:
[,.!?;:]|\b[a-z']+\b
在演示中查看比賽。
例如,在JavaScript中:
resultArray = yourString.match(/[,.!?;:]|\b[a-z']+\b/ig);
說明
[,.!?;:]
匹配括號內的一個字符 |
) \\b
匹配單詞邊界 [a-z']+
一個或多個字母或撇號 \\b
字邊界
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.