简体   繁体   中英

how can i remove special emoji's using xquery from text

I have a $text = "Hello üäö$"

I wanted to remove just emoji's from the text using xquery . How can i do that?

Expected result : "Hello üäö$"

i tried to use:

replace($text, '\p{IsEmoticons}+', '')

but didn't work.

it just removed smiley's

Result now: "Hello üäö$" Expected result: "Hello üäö$"

Thanks in advance:)

I outlined the approach in my answer to the original question , which I updated based on your comment asking about how to strip out.

Quoting from that expanded answer:

The "Emoticons" block doesn't contain all characters commonly associated with "emoji." For example, (Purple Heart, U+1F49C), according to a site like https://www.compart.com/en/unicode/U+1F49C that lets you look up Unicode character information, is from:

Miscellaneous Symbols and Pictographs, U+1F300 - U+1F5FF

This block is not available in XPath or XQuery processors, since it is neither listed in the XML Schema 1.0 spec linked above, nor is it in Unicode block names for use in XSD regular expressions —a list of blocks that XPath and XQuery processors conforming to XML Schema 1.1 are required to support .

For characters from blocks not available in XPath or XQuery, you can manually construct character classes. For example, given the purple heart character above, we can match it as follows:

 replace("Purple heart", "[🌀-🗿]", "")

This returns the expected result:

 Purple Heart

This approach can be applied to,, or any other character:

  1. Locate the character's unicode block.
  2. Craft your regular expression with the block name (if available in XPath) or character class.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM