How to convert a formatted string into plain text

User copy paste and send data in following format: "๐•›๐• ๐•ง๐•ช ๐••๐•–๐•“๐•“๐•š๐•–" I need to convert it into plain txt (we can say ascii chars) like 'jovy debbie' It comes in different font and format: ex: '๐‘ฑ๐’†๐’๐’Š๐’„๐’‚ ๐‘ซ๐’–๐’ˆ๐’๐’”' '๐™ถ๐šŽ๐šŸ๐š’๐šŽ๐š•๐šข๐š— ๐™ฝ๐š’๐šŒ๐š˜๐š•๐šŽ ๐™ป๐šž๐š–๐š‹๐šŠ๐š'

Any Help will be Appreciated, I already refer other stack overflow question but no luck :(

Those letters are from theMathematical Alphanumeric Symbols block.

Since they have a fixed offset to their ASCII counterparts, you could use tr to map them, eg:

"๐•›๐• ๐•ง๐•ช ๐••๐•–๐•“๐•“๐•š๐•–".tr("๐•’-๐•ซ", "a-z")
#=> "jovy debbie"

The same approach can be used for the other styles, eg

"๐‘ฑ๐’†๐’๐’Š๐’„๐’‚ ๐‘ซ๐’–๐’ˆ๐’๐’”".tr("๐’‚-๐’›๐‘จ-๐’", "a-zA-Z")
#=> "Jenica Dugos"

This gives you full control over the character mapping.

Alternatively, you could try Unicode normalization . The NFKC / NFKD forms should remove most formatting and seem to work for your examples:

"๐•›๐• ๐•ง๐•ช ๐••๐•–๐•“๐•“๐•š๐•–".unicode_normalize(:nfkc)
#=> "jovy debbie"

"๐‘ฑ๐’†๐’๐’Š๐’„๐’‚ ๐‘ซ๐’–๐’ˆ๐’๐’”".unicode_normalize(:nfkc)
#=> "Jenica Dugos"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
็ฒคICPๅค‡18138465ๅท  © 2020-2024 STACKOOM.COM