將Python中的特殊字符替換為“N/A”

Question

我想將所有僅包含表情符號（例如df['Comments'][2]行更改為 N/A。

df['Comments'][:6]
0                                                          nice
1                                                       Insane3
2                                                          😻😻❤️
3                                                @bertelsen1986
4                       20 or 30 mm rise on the Renthal Fatbar?
5                                     Luckily I have one to 🔥💪🏻

以下代碼不會返回我期望的 output：

df['Comments'].replace(';', ':', '!', '*', np.NaN)

預計 Output：

df['Comments'][:6]
0                                                          nice
1                                                       Insane3
2                                                          nan
3                                                @bertelsen1986
4                       20 or 30 mm rise on the Renthal Fatbar?
5                                     Luckily I have one to 🔥💪🏻

Answer 1

您可以通過迭代每行中的 unicode 個字符來檢測僅包含表情符號的行（使用emoji和unicodedata包）：

df = {}
df['Comments'] = ["Test", "Hello 😉", "😉😉😉"]

import unicodedata
import numpy as np
from emoji import UNICODE_EMOJI
for i in range(len(df['Comments'])):
    pure_emoji = True
    for unicode_char in unicodedata.normalize('NFC', df['Comments'][i]):
        if unicode_char not in UNICODE_EMOJI:
            pure_emoji = False
            break
    if pure_emoji:
        df['Comments'][i] = np.NaN
print(df['Comments'])

Answer 2

Function（remove_emoji）參考https://stackoverflow.com/a/61839832/6075699

嘗試
安裝第一個emoji庫 - pip install emoji

import re
import emoji

df.Comments.apply(lambda x: x if (re.sub(r'(:[!_\-\w]+:)', '', emoji.demojize(x)) != "") else np.nan)
0                         nice
1                      Insane3
2                          NaN
3               @bertelsen1986
4    Luckily I have one to 🔥💪🏻
Name: a, dtype: object

將Python中的特殊字符替換為“N/A”

問題描述

2 個解決方案

解決方案1
0 2020-08-30 10:58:48

解決方案2
0 已采納 2020-08-30 11:12:08

將Python中的特殊字符替換為“N/A”

問題描述

2 個解決方案

解決方案1 0 2020-08-30 10:58:48

解決方案2 0 已采納 2020-08-30 11:12:08

解決方案1
0 2020-08-30 10:58:48

解決方案2
0 已采納 2020-08-30 11:12:08