简体   繁体   English

正则表达式删除所有内容,但从 R 中的字符串中删除表情符号?

[英]Regex to remove everything, but emojis from the string in R?

I have a big .xlsx file containing tweets with emojis.我有一个很大的 .xlsx 文件,其中包含带有表情符号的推文。 I am working on a personal project where I want to make a network graph from the extracted emojis.我正在做一个个人项目,我想从提取的表情符号制作网络图。 For example, if I have this in one of the columns:例如,如果我在其中一列中有这个:

Christian✝️, Husband👫, Father👨‍👩‍👦‍👦, Former TV 📺Meteorologist🌪, GOP🐘, LTC 🔫, Dolfan🐬, since ‘75, Yanks Fan⚾️ & UCONN Alum🏀 Go Whalers🐋!

So how would I only get this as on output?那么我怎么才能把它作为输出呢?

✝️👫👨‍👩‍👦‍👦📺🌪🐘🔫🐬⚾️🏀🐋

I have looked thoroughly everywhere, in Stack Overflow and over the internet, however I couldn't find anything.我在 Stack Overflow 和互联网上到处都找遍了,但是我找不到任何东西。 I am a beginner in R.我是 R 的初学者。

Edit编辑

I am getting the Unicode (in UTF-8 format) when I normally read the file, but I don't know how to turn those Unicode to the emojis.当我正常读取文件时,我得到了 Unicode(UTF-8 格式),但我不知道如何将这些 Unicode 转换为表情符号。 There are dictionaries online, but they only give me the name of some of these emojis, they are very outdated.网上有词典,但他们只给了我其中一些表情符号的名称,它们非常过时。

Edit 2编辑 2

There is a solution that works in Linux, but I am seeking a solution/hint to get this to work in the Windows.有一个适用于 Linux 的解决方案,但我正在寻找一个解决方案/提示,让它在 Windows 中工作。

This works for me, with the caveat only the cross prints out as an emoji in the console, the rest are the unicode representation.这对我有用,需要注意的是,只有交叉在控制台中作为表情符号打印出来,其余的是 unicode 表示。

# install.packages("remotes")
# remotes::install_github("hadley/emo")
emojis <- "Christian✝️, Husband👫, Father👨‍👩‍👦‍👦, Former TV 📺Meteorologist🌪, GOP🐘, LTC 🔫, Dolfan🐬, since ‘75, Yanks Fan⚾️ & UCONN Alum🏀 Go Whalers🐋!"
emojis
only_emojis <- emo::ji_extract_all(emojis)
only_emojis

#  emo::ji_extract_all(emojis)
# [[1]]
#  [1] "✝️"      "\U0001f46b"      "\U0001f468"      "\U0001f469"      "\U0001f466"      "\U0001f466"      "\U0001f4fa"      "\U0001f418"      "\U0001f52b"      "\U0001f42c"      "\u26be" "\U0001f3c0"      "\U0001f40b"   

# install.packages("utf8")
utf8::utf8_print(only_emojis[[1]])  
# [1] "✝️​" "👫​" "👨​" "👩​" "👦​" "👦​" "📺​" "🐘​" "🔫​" "🐬​" "⚾​" "🏀​" "🐋​"

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM