正则表达式匹配表情符号

Question

We are working on a project where we want users to be able to use both emoji syntax (like :smile: , :heart: , :confused: , :stuck_out_tongue: ) as well as normal emoticons (like :) , <3 , :/ , :p ) 我们正在研究一个项目，我们希望用户能够同时使用的表情符号的语法（如:smile: ， :heart: ， :confused: ， :stuck_out_tongue:以及正常的表情符号（如:) ， <3 :/ :p ）

I'm having trouble with the emoticon syntax because sometimes those character sequences will occur in: 我在使用表情符号语法时遇到问题，因为有时这些字符序列会出现在：

normal strings or URL's - http :/ /example.com 普通字符串或URL - http :/ example.com
within the emoji syntax - :p encil: 在表情符号语法中 - :p encil：

How can I find these emoticon character sequences but not when other characters are near them? 我怎样才能找到这些表情字符序列，而不是当其他字符靠近它们时？

The entire regex I'm using for all the emoticons is huge, so here's a trimed down version: 我用于所有表情符号的整个正则表达式是巨大的，所以这里是一个trimed down版本：

(\:\)|\:\(|<3|\:\/|\:-\/|\:\||\:p)

You can play with a demo of it in action here: http://regexr.com/3a8o5 您可以在这里播放它的演示： http ： //regexr.com/3a8o5

Answer 1

Match emoji first (to take care of the :pencil: example) and then check for a terminating whitespace or newline: 首先匹配表情符号（以处理：pencil：示例），然后检查终止空格或换行符：

(\:\w+\:|\<[\/\\]?3|[\(\)\\\D|\*\$][\-\^]?[\:\;\=]|[\:\;\=B8][\-\^]?[3DOPp\@\$\*\\\)\(\/\|])(?=\s|[\!\.\?]|$)

This regex matches the following (preferring emoji) returning the match in matching group 1: 此正则表达式匹配以下（首选表情符号）返回匹配组1中的匹配：

:( :) :P :p :O :3 :| :/ :\ :$ :* :@
:-( :-) :-P :-p :-O :-3 :-| :-/ :-\ :-$ :-* :-@
:^( :^) :^P :^p :^O :^3 :^| :^/ :^\ :^$ :^* :^@
): (: $: *:
)-: (-: $-: *-:
)^: (^: $^: *^:
<3 </3 <\3
:smile: :hug: :pencil:

It also supports terminal punctuation as a delimiter in addition to white space. 除了空格之外，它还支持终端标点符号作为分隔符。

You can see more details and test it here: https://regex101.com/r/aM3cU7/4 您可以在此处查看更多详细信息并进行测试： https ： //regex101.com/r/aM3cU7/4

Answer 2

I assume these emoticons will commonly be used with spaces before and after. 我假设这些表情符号通常会在前后使用空格。 Then \\s might be what you're looking for, as it represents a white space. 那么\\s可能就是你要找的东西，因为它代表了一个空白区域。

Then your regex would become 然后你的正则表达式会成为

\s+(\:\)|\:\(|<3|\:\/|\:-\/|\:\||\:p)\s

Answer 3

Make a positive look-ahead for a space 对空间做出积极的展望

([\:\<]-?[)(|\\/pP3D])(?:(?=\s))
 |       |      |         |
 |       |      |         |
 |       |      |         |-> match last separating space
 |       |      |-> match last part of the emot
 |       |-> it may have a `-` or not 
 |-> first part of the emoticon

Since you're using javascript, and you don't have access to look arounds: 由于您使用的是javascript，并且您无权环顾四周：

/([\:\<]-?[)|\\/pP3D])(\s|$)/g.exec('hi :) ;D');

And then just splice() the resulting array out of its last entry (that's most probably a space) 然后将最终的数组（最可能是一个空格） splice()出来（）

Answer 4

You want regex look-arounds regarding spacing. 你想要有关间距的正则表达式。 Another answer here suggested a positive look-ahead, though I'd go double-negative: 这里的另一个答案表明积极的前瞻，虽然我会双重否定：

(?<!\S)(\:\)|\:\(|<3|\:\/|\:-\/|\:\||\:p)(?!\S)

While JavaScript doesn't support (?<!pattern) , look-behind can be mimicked : 虽然JavaScript不支持(?<!pattern) ，但可以模仿look-behind ：

test_string.replace(/(\S)?(\:\)|\:\(|<3|\:\/|\:-\/|\:\||\:p)(?!\S)/,
                    function($0, $1) { return $1 ? $0 : replacement_text; });

All I did was prefix your code with (?<!\\S) in front and suffix with (?!\\S) in back. 我所做的就是在前面添加(?<!\\S)前缀，后缀为(?!\\S)后缀。 The prefix ensures you do not follow a non-whitespace character, so the only valid leading entries are spaces or nothing (start of line). 前缀确保您不遵循非空白字符，因此唯一有效的前导条目是空格或空格（行首）。 The suffix does the same thing, ensuring you are not followed by a non-whitespace character. 后缀执行相同的操作，确保您不会跟随非空白字符。 See also this more thorough regex walk-through . 另请参见这个更彻底的正则表达式演练。

One of the comments to the question itself was suggesting \\b (word boundary) markers. 对问题本身的评论之一是建议\\b （单词边界）标记。 I don't recommend these. 我不推荐这些。 In fact, this suggestion would do the opposite of what you want; 事实上，这个建议会与你想要的相反; \\b:/ will indeed match http:// since there is a word boundary between the p and the : . \\b:/确实匹配http://因为p和:之间有一个单词边界。 This kind of reasoning would suggest \\B (not a word boundary), eg \\B:/\\B . 这种推理会建议\\B （不是单词边界），例如\\B:/\\B This is more portable (it works with pretty much all regex parsers while look-arounds do not), and you can choose it in that case, but I prefer the look-arounds. 这是更便携的（几乎所有的正则表达式解析器都可以使用，而环顾四周看不到），你可以在这种情况下选择它，但我更喜欢环顾四周。

正则表达式匹配表情符号

问题描述

4 个解决方案

解决方案1
6 已采纳 2015-01-21 22:10:20

解决方案2
1 2015-01-21 21:29:17

解决方案3
1 2015-01-21 21:29:30

解决方案4
0 2015-01-22 02:57:00

正则表达式匹配表情符号

问题描述

4 个解决方案

解决方案1 6 已采纳 2015-01-21 22:10:20

解决方案2 1 2015-01-21 21:29:17

解决方案3 1 2015-01-21 21:29:30

解决方案4 0 2015-01-22 02:57:00

解决方案1
6 已采纳 2015-01-21 22:10:20

解决方案2
1 2015-01-21 21:29:17

解决方案3
1 2015-01-21 21:29:30

解决方案4
0 2015-01-22 02:57:00