简体   繁体   English

使用正则表达式通过分割功能分割表情符号字符的字符串

[英]Split string of emoji characters by split function with regex

I want to use split function of Javascript to split string of emoji characters. 我想使用Javascript的分割功能来分割表情符号字符的字符串。 In stackoverflow there are many question like that, but I cannot find any completed solutions. 在stackoverflow中,有很多类似的问题,但是我找不到任何完整的解决方案。 So I do it by my own way: 所以我用自己的方式做:

a) Use split function with regex. a)与正则表达式一起使用拆分功能。

b) Split emoji characters by regex unicode matches: from \? to \? and from \? to \?. b)通过正则表达式unicode匹配拆分表情符号字符:从\\ uD800到\\ uDBFF,从\\ uDC00到\\ uDFFF。

c) In this regex, exclude zero-with-joiner (\‍) and variation selector (\️) characters. c)在此正则表达式中,排除连接符零(\\ u200D)和变体选择器(\\ uFE0F)字符。 So I wrote as follows: 所以我写如下:

var p = '👦🏼👧🏼👩🏼👧🏾👧🏿👩‍👩‍👧‍👧👭👫👨‍❤️‍💋‍👨';

and split it: 并拆分:

var split = p.split(/(?![\u200D\uFE0F])([\uD800-\uDBFF][\uDC00-\uDFFF])/);

But the result is wrong :( 但是结果是错误的:(

["", "👦", "", "🏼", "", "👧", "", "🏼", "", "👩", "", "🏼", "", "👧", "", "🏾", "", "👧", "", "🏿", "", "👩", "‍", "👩", "‍", "👧", "‍", "👧", "", "👭", "", "👫", "", "👨", "‍❤️‍", "💋", "‍", "👨", ""]

Did I use the excluding selector for regex right? 我对正则表达式使用了排除选择器吗? If right, the error caused by my idea? 如果正确,错误是由我的想法引起的? The expected result need to be: ["👦🏼", "👧🏼", "👩🏼", "👧🏾", "👧🏿", "👩‍👩‍👧‍👧", "👭", "👫", "👨‍❤️‍💋‍👨"] 预期结果必须为:[“👦🏼”,“👧🏼”,“👩🏼”,“👧🏾”,“👧🏿”,“ 👩‍👩‍👧‍👧”,“👭”,“👫” ,“ 👨‍❤️‍💋‍👨”]

=== ===

I want to update info. 我想更新信息。 I solved this problem for my site: https://www.emojionline.org . 我为我的网站https://www.emojionline.org解决了这个问题。 You can test. 您可以测试。 I just use a dictionary that hold all emojis and I use the replace function to replace every emoji by |emoji|. 我只是使用容纳所有表情符号的字典,并使用替换功能将所有表情符号替换为| emoji |。 And I can split string emoji by symbol |. 而且我可以用符号|分割字符串表情符号。 That works well :) 效果很好:)

I extended the emoji-regex by Mathias Bynens a bit with a [\?-\?][\?-\?](?:[\‍\️][\?-\?][\?-\?]){2,} alternative. 我用[\?-\?][\?-\?](?:[\‍\️][\?-\?][\?-\?]){2,}扩展了Mathias Bynens的emoji-regex [\?-\?][\?-\?](?:[\‍\️][\?-\?][\?-\?]){2,}替代。 It matches a common 2-byte emoji followed with 2 or more sequences (this can be controlled with the {2,} limiting quantifier) of either zero-width joiner or variation selector and again the common 2-byte emoji char. 它匹配一个普通的2字节表情符号,后跟两个或多个零宽度连接符或变体选择器的序列(可以用{2,}限制量词控制),再匹配两个普通的2字节表情符号char。

Without the alternative, the results are [ '👦🏼','👧🏼','👩🏼','👧🏾','👧🏿','👩‍👩‍👧','👧','👭','👫','👨‍❤️‍💋‍👨' ] . 如果没有其他选择,结果为[ '👦🏼','👧🏼','👩🏼','👧🏾','👧🏿','👩‍👩‍👧','👧','👭','👫','👨‍❤️‍💋‍👨' ]

 var p = 'my family 👦🏼👧🏼👩🏼👧🏾👧🏿👩‍👩‍👧‍👧👭👫👨‍❤️‍💋‍👨 here'; var rx = /([\?-\?][\?-\?](?:[\‍\️][\?-\?][\?-\?]){2,}|\?\?(?:\‍(?:(?:\?\?\‍)?\?\?|(?:\?\?\‍)?\?\?)|\?[\?-\?])|\?\?\‍(?:\?\?\‍)?\?\?\‍\?\?|\?\?\‍(?:\?\?\‍)?\?\?\‍(?:\?[\?\?])|\?\?\️\‍\?\?|(?:\?[\?\?\?]|\?[\?\?\?\?\?\?\?\?\?-\?\?\?\?\?\?-\?]|\?[\?\?-\?\?\?\?-\?])(?:\?[\?-\?])\‍[\♀\♂]\️|\?\?(?:\?[\?-\?])\‍(?:\?[\?\?\?\?\?\?\?]|\?[\?\?\?\?\?\?])|(?:\?[\?\?\?]|\?[\?\?\?\?\?\?\?\?\?\?-\?\?\?\?\?\?-\?]|\?[\?\?-\?\?-\?\?-\?])\‍[\♀\♂]\️|\?\?\?\?|\?\?\?\?|\?\?\?\?|\?\?(?:\?[\?\?\?\?\?\?\?])|\?\?(?:\?[\?\?\?\?\?])|\?\?(?:\?[\?\?\?\?-\?\?-\?\?\?-\?])|(?:\⛹|\?[\?\?]|\?\?)(?:\️\‍[\♀\♂]|(?:\?[\?-\?])\‍[\♀\♂])\️|(?:\?\?\️\‍\?\?|\?\?(?:\?[\?-\?])\‍[\⚕\⚖\✈]|\?\?\‍[\⚕\⚖\✈]|\?\?(?:(?:\?[\?-\?])\‍[\⚕\⚖\✈]|\‍[\⚕\⚖\✈]))\️|\?\?(?:\?[\?\?-\?\?-\?])|\?\?\‍(?:\?[\?\?\?\?\?\?\?]|\?[\?\?\?\?\?\?]|\❤\️\‍(?:\?\?\‍(?:\?[\?\?])|\?[\?\?]))|\?\?(?:\?[\?-\?\?\?\?-\?\?])|\?\?(?:\?[\?\?\?\?])|\?\?(?:\?[\?\?\?\?\?\?])|\?\?(?:\?[\?-\?\?\?\?])|[#\\*0-9]\️\⃣|\?\?(?:\?[\?\?\?-\?\?-\?\?-\?\?\?\?\?])|\?\?(?:\?[\?-\?\?\?\?\?\?-\?\?\?\?])|\?\?(?:\?[\?\?\?])|\?\?(?:\?[\?\?-\?\?-\?\?-\?\?\?])|\?\?(?:\?[\?\?\?\?\?\?\?])|\?\?(?:\?[\?\?\?-\?\?\?\?\?\?\?\?])|\?\?\?\?\?\?(?:\?\?\?\?\?\?|\?\?\?\?\?\?|\?\?\?\?\?\?)\?\?|\?\?(?:\‍(?:\❤\️\‍(?:\?\?\‍)?\?\?|(?:(?:\?[\?\?])\‍)?\?\?\‍\?\?|(?:(?:\?[\?\?])\‍)?\?\?\‍(?:\?[\?\?])|\?[\?\?\?\?\?\?\?]|\?[\?\?\?\?\?\?])|(?:\?[\?-\?])\‍(?:\?[\?\?\?\?\?\?\?]|\?[\?\?\?\?\?\?]))|\?\?(?:\?[\?-\?\?-\?\?-\?\?\?-\?])|\?\?(?:\?[\?\?-\?\?\?\?\?\?\?\?])|\?\?(?:\?[\?\?])|\?\?(?:\?[\?-\?\?-\?\?-\?])|\?\?(?:\?[\?\?\?\?-\?\?-\?\?\?\?\?\?])|\?\?(?:\?[\?\?\?-\?\?-\?\?-\?\?\?])|\?\?(?:\?[\?\?\?\?\?\?\?])|\?\?(?:\?[\?\?\?\?\?\?-\?])|\?\?(?:\?[\?\?])|(?:\⛹|\?[\?\?]|\?\?)(?:\?[\?-\?])|(?:\?[\?\?\?]|\?[\?\?\?\?\?\?\?\?\?-\?\?\?\?\?\?-\?]|\?[\?\?-\?\?\?\?-\?])(?:\?[\?-\?])|(?:[\☝\✊-\✍]|\?[\?\?\?]|\?[\?\?\?-\?\?\?\?\?\?-\?\?\?\?\?\?\?\?\?\?\?\?\?\?\?]|\?[\?-\?\?\?\?-\?\?-\?])(?:\?[\?-\?])|\?\?(?:\‍(?:(?:(?:\?[\?\?])\‍)?\?\?|(?:(?:\?[\?\?])\‍)?\?\?)|\?[\?-\?])|(?:[\☝\⛹\✊-\✍]|\?[\?\?-\?\?\?-\?]|\?[\?\?\?-\?\?-\?\?\?-\?\?\?-\?\?-\?\?\?\?\?\?\?\?\?-\?\?-\?\?\?-\?\?\?]|\?[\?-\?\?\?\?\?-\?\?\?\?-\?])(?:\?[\?-\?])?|(?:[\⌚\⌛\⏩-\⏬\⏰\⏳\◽\◾\☔\☕\♈-\♓\♿\⚓\⚡\⚪\⚫\⚽\⚾\⛄\⛅\⛎\⛔\⛪\⛲\⛳\⛵\⛺\⛽\✅\✊\✋\✨\❌\❎\❓-\❕\❗\➕-\➗\➰\➿\⬛\⬜\⭐\⭕]|\?[\?\?\?\?-\?\?-\?\?\?\?\?-\?\?-\?\?\?\?-\?\?-\?\?-\?\?-\?\?-\?\?-\?\?-\?\?\?-\?]|\?[\?-\?\?\?-\?\?-\?\?-\?\?-\?\?\?\?\?\?-\?\?-\?\?\?-\?\?\?\?-\?]|\?[\?-\?\?-\?\?-\?\?-\?\?-\?\?-\?\?\?-\?])|(?:[#\\*0-9\\xA9\\xAE\‼\⁉\™\ℹ\↔-\↙\↩\↪\⌚\⌛\⌨\⏏\⏩-\⏳\⏸-\⏺\Ⓜ\▪\▫\▶\◀\◻-\◾\☀-\☄\☎\☑\☔\☕\☘\☝\☠\☢\☣\☦\☪\☮\☯\☸-\☺\♀\♂\♈-\♓\♠\♣\♥\♦\♨\♻\♿\⚒-\⚗\⚙\⚛\⚜\⚠\⚡\⚪\⚫\⚰\⚱\⚽\⚾\⛄\⛅\⛈\⛎\⛏\⛑\⛓\⛔\⛩\⛪\⛰-\⛵\⛷-\⛺\⛽\✂\✅\✈-\✍\✏\✒\✔\✖\✝\✡\✨\✳\✴\❄\❇\❌\❎\❓-\❕\❗\❣\❤\➕-\➗\➡\➰\➿\⤴\⤵\⬅-\⬇\⬛\⬜\⭐\⭕\〰\〽\㊗\㊙]|\?[\?\?\?\?\?\?\?\?-\?\?-\?\?\?\?\?\?-\?\?\?\?-\?\?-\?\?\?\?-\?\?-\?\?-\?\?-\?]|\?[\?-\?\?-\?\?-\?\?-\?\?\?\?-\?\?\?-\?\?\?\?\?\?\?\?\?\?\?-\?\?-\?\?-\?\?\?\?\?\?\?-\?\?-\?\?-\?\?-\?\?\?\?\?\?-\?]|\?[\?-\?\?-\?\?-\?\?-\?\?-\?\?-\?\?\?-\?])\️)/; var res = p.split(rx).filter(Boolean); document.body.innerHTML = res; 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM