简体   繁体   中英

Split string of emoji characters by split function with regex

I want to use split function of Javascript to split string of emoji characters. In stackoverflow there are many question like that, but I cannot find any completed solutions. So I do it by my own way:

a) Use split function with regex.

b) Split emoji characters by regex unicode matches: from \? to \? and from \? to \?.

c) In this regex, exclude zero-with-joiner (\‍) and variation selector (\️) characters. So I wrote as follows:

var p = '👦🏼👧🏼👩🏼👧🏾👧🏿👩‍👩‍👧‍👧👭👫👨‍❤️‍💋‍👨';

and split it:

var split = p.split(/(?![\u200D\uFE0F])([\uD800-\uDBFF][\uDC00-\uDFFF])/);

But the result is wrong :(

["", "👦", "", "🏼", "", "👧", "", "🏼", "", "👩", "", "🏼", "", "👧", "", "🏾", "", "👧", "", "🏿", "", "👩", "‍", "👩", "‍", "👧", "‍", "👧", "", "👭", "", "👫", "", "👨", "‍❤️‍", "💋", "‍", "👨", ""]

Did I use the excluding selector for regex right? If right, the error caused by my idea? The expected result need to be: ["👦🏼", "👧🏼", "👩🏼", "👧🏾", "👧🏿", "👩‍👩‍👧‍👧", "👭", "👫", "👨‍❤️‍💋‍👨"]

===

I want to update info. I solved this problem for my site: https://www.emojionline.org . You can test. I just use a dictionary that hold all emojis and I use the replace function to replace every emoji by |emoji|. And I can split string emoji by symbol |. That works well :)

I extended the emoji-regex by Mathias Bynens a bit with a [\?-\?][\?-\?](?:[\‍\️][\?-\?][\?-\?]){2,} alternative. It matches a common 2-byte emoji followed with 2 or more sequences (this can be controlled with the {2,} limiting quantifier) of either zero-width joiner or variation selector and again the common 2-byte emoji char.

Without the alternative, the results are [ '👦🏼','👧🏼','👩🏼','👧🏾','👧🏿','👩‍👩‍👧','👧','👭','👫','👨‍❤️‍💋‍👨' ] .

 var p = 'my family 👦🏼👧🏼👩🏼👧🏾👧🏿👩‍👩‍👧‍👧👭👫👨‍❤️‍💋‍👨 here'; var rx = /([\?-\?][\?-\?](?:[\‍\️][\?-\?][\?-\?]){2,}|\?\?(?:\‍(?:(?:\?\?\‍)?\?\?|(?:\?\?\‍)?\?\?)|\?[\?-\?])|\?\?\‍(?:\?\?\‍)?\?\?\‍\?\?|\?\?\‍(?:\?\?\‍)?\?\?\‍(?:\?[\?\?])|\?\?\️\‍\?\?|(?:\?[\?\?\?]|\?[\?\?\?\?\?\?\?\?\?-\?\?\?\?\?\?-\?]|\?[\?\?-\?\?\?\?-\?])(?:\?[\?-\?])\‍[\♀\♂]\️|\?\?(?:\?[\?-\?])\‍(?:\?[\?\?\?\?\?\?\?]|\?[\?\?\?\?\?\?])|(?:\?[\?\?\?]|\?[\?\?\?\?\?\?\?\?\?\?-\?\?\?\?\?\?-\?]|\?[\?\?-\?\?-\?\?-\?])\‍[\♀\♂]\️|\?\?\?\?|\?\?\?\?|\?\?\?\?|\?\?(?:\?[\?\?\?\?\?\?\?])|\?\?(?:\?[\?\?\?\?\?])|\?\?(?:\?[\?\?\?\?-\?\?-\?\?\?-\?])|(?:\⛹|\?[\?\?]|\?\?)(?:\️\‍[\♀\♂]|(?:\?[\?-\?])\‍[\♀\♂])\️|(?:\?\?\️\‍\?\?|\?\?(?:\?[\?-\?])\‍[\⚕\⚖\✈]|\?\?\‍[\⚕\⚖\✈]|\?\?(?:(?:\?[\?-\?])\‍[\⚕\⚖\✈]|\‍[\⚕\⚖\✈]))\️|\?\?(?:\?[\?\?-\?\?-\?])|\?\?\‍(?:\?[\?\?\?\?\?\?\?]|\?[\?\?\?\?\?\?]|\❤\️\‍(?:\?\?\‍(?:\?[\?\?])|\?[\?\?]))|\?\?(?:\?[\?-\?\?\?\?-\?\?])|\?\?(?:\?[\?\?\?\?])|\?\?(?:\?[\?\?\?\?\?\?])|\?\?(?:\?[\?-\?\?\?\?])|[#\\*0-9]\️\⃣|\?\?(?:\?[\?\?\?-\?\?-\?\?-\?\?\?\?\?])|\?\?(?:\?[\?-\?\?\?\?\?\?-\?\?\?\?])|\?\?(?:\?[\?\?\?])|\?\?(?:\?[\?\?-\?\?-\?\?-\?\?\?])|\?\?(?:\?[\?\?\?\?\?\?\?])|\?\?(?:\?[\?\?\?-\?\?\?\?\?\?\?\?])|\?\?\?\?\?\?(?:\?\?\?\?\?\?|\?\?\?\?\?\?|\?\?\?\?\?\?)\?\?|\?\?(?:\‍(?:\❤\️\‍(?:\?\?\‍)?\?\?|(?:(?:\?[\?\?])\‍)?\?\?\‍\?\?|(?:(?:\?[\?\?])\‍)?\?\?\‍(?:\?[\?\?])|\?[\?\?\?\?\?\?\?]|\?[\?\?\?\?\?\?])|(?:\?[\?-\?])\‍(?:\?[\?\?\?\?\?\?\?]|\?[\?\?\?\?\?\?]))|\?\?(?:\?[\?-\?\?-\?\?-\?\?\?-\?])|\?\?(?:\?[\?\?-\?\?\?\?\?\?\?\?])|\?\?(?:\?[\?\?])|\?\?(?:\?[\?-\?\?-\?\?-\?])|\?\?(?:\?[\?\?\?\?-\?\?-\?\?\?\?\?\?])|\?\?(?:\?[\?\?\?-\?\?-\?\?-\?\?\?])|\?\?(?:\?[\?\?\?\?\?\?\?])|\?\?(?:\?[\?\?\?\?\?\?-\?])|\?\?(?:\?[\?\?])|(?:\⛹|\?[\?\?]|\?\?)(?:\?[\?-\?])|(?:\?[\?\?\?]|\?[\?\?\?\?\?\?\?\?\?-\?\?\?\?\?\?-\?]|\?[\?\?-\?\?\?\?-\?])(?:\?[\?-\?])|(?:[\☝\✊-\✍]|\?[\?\?\?]|\?[\?\?\?-\?\?\?\?\?\?-\?\?\?\?\?\?\?\?\?\?\?\?\?\?\?]|\?[\?-\?\?\?\?-\?\?-\?])(?:\?[\?-\?])|\?\?(?:\‍(?:(?:(?:\?[\?\?])\‍)?\?\?|(?:(?:\?[\?\?])\‍)?\?\?)|\?[\?-\?])|(?:[\☝\⛹\✊-\✍]|\?[\?\?-\?\?\?-\?]|\?[\?\?\?-\?\?-\?\?\?-\?\?\?-\?\?-\?\?\?\?\?\?\?\?\?-\?\?-\?\?\?-\?\?\?]|\?[\?-\?\?\?\?\?-\?\?\?\?-\?])(?:\?[\?-\?])?|(?:[\⌚\⌛\⏩-\⏬\⏰\⏳\◽\◾\☔\☕\♈-\♓\♿\⚓\⚡\⚪\⚫\⚽\⚾\⛄\⛅\⛎\⛔\⛪\⛲\⛳\⛵\⛺\⛽\✅\✊\✋\✨\❌\❎\❓-\❕\❗\➕-\➗\➰\➿\⬛\⬜\⭐\⭕]|\?[\?\?\?\?-\?\?-\?\?\?\?\?-\?\?-\?\?\?\?-\?\?-\?\?-\?\?-\?\?-\?\?-\?\?-\?\?\?-\?]|\?[\?-\?\?\?-\?\?-\?\?-\?\?-\?\?\?\?\?\?-\?\?-\?\?\?-\?\?\?\?-\?]|\?[\?-\?\?-\?\?-\?\?-\?\?-\?\?-\?\?\?-\?])|(?:[#\\*0-9\\xA9\\xAE\‼\⁉\™\ℹ\↔-\↙\↩\↪\⌚\⌛\⌨\⏏\⏩-\⏳\⏸-\⏺\Ⓜ\▪\▫\▶\◀\◻-\◾\☀-\☄\☎\☑\☔\☕\☘\☝\☠\☢\☣\☦\☪\☮\☯\☸-\☺\♀\♂\♈-\♓\♠\♣\♥\♦\♨\♻\♿\⚒-\⚗\⚙\⚛\⚜\⚠\⚡\⚪\⚫\⚰\⚱\⚽\⚾\⛄\⛅\⛈\⛎\⛏\⛑\⛓\⛔\⛩\⛪\⛰-\⛵\⛷-\⛺\⛽\✂\✅\✈-\✍\✏\✒\✔\✖\✝\✡\✨\✳\✴\❄\❇\❌\❎\❓-\❕\❗\❣\❤\➕-\➗\➡\➰\➿\⤴\⤵\⬅-\⬇\⬛\⬜\⭐\⭕\〰\〽\㊗\㊙]|\?[\?\?\?\?\?\?\?\?-\?\?-\?\?\?\?\?\?-\?\?\?\?-\?\?-\?\?\?\?-\?\?-\?\?-\?\?-\?]|\?[\?-\?\?-\?\?-\?\?-\?\?\?\?-\?\?\?-\?\?\?\?\?\?\?\?\?\?\?-\?\?-\?\?-\?\?\?\?\?\?\?-\?\?-\?\?-\?\?-\?\?\?\?\?\?-\?]|\?[\?-\?\?-\?\?-\?\?-\?\?-\?\?-\?\?\?-\?])\️)/; var res = p.split(rx).filter(Boolean); document.body.innerHTML = res; 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM