strip 有問題，替換 pandas 中的函數 dataframe

Question

我正在嘗試使用 split() 和 replace() 函數從 pandas dataframe 字列中去除所有特殊字符。

但是，它不起作用。 特殊字符不會從單詞中刪除。

有人可以啟發我嗎？

import pandas as pd
import datetime

df = pd.read_csv("2022-12-08_word_selection.csv")

for n in df.index:
    i = str(df.loc[n, "words"])
    if len(i) > 12:
        df.loc[n, "words"] = ""
df["words"] = df["words"].str.replace("$", "s")
df["words"] = df["words"].str.strip('[,:."*+-#/\^`@}{~&%â€™àáâæ¢ß¥£™©®ª×÷±²³¼½¾µ¿¶·¸º°¯§…¤¦≠¬ˆ¨‰øœšÞùúûý€')
df["words"] = df["words"].str.strip("\n")
df = df.groupby(["words"]).mean()

print(df)

首先，程序會替換“單詞”列中超過 12 個字符的所有單詞。 然后，我希望它能從“單詞”列中刪除所有特殊字符。

Answer 1

首先，避免使用循環，而是使用transform()將長度超過 12 個字符的單詞替換為空字符串。 其次，調用replace()之前不需要Series.str轉換。 第三， split()僅刪除前導和尾隨字符，因此這不是您想要的。 改用帶replace()的正則表達式。 最后，要刪除特殊字符，使用正則表達式否定集來匹配和刪除非字母或數字的字符會更簡潔。 這看起來像： "[^A-Za-z0-9]" 。

以下是一些有效的示例數據和代碼：

import pandas as pd
import re

df = pd.DataFrame(
    {
        "words": [
            123,
            "abcd",
            "efgh",
            "abcdefghijklmn",
            "lol%",
            "Hornbæk",
            "10:03",
            "$999¼",
        ]
    }
)
# Faster and more concise than a loop
df["words"] = df["words"].transform(lambda x: "" if len(x) > 12 else x)
# Not sure why you do this but okay
df["words"] = df["words"].replace("$", "s")
# Use a regex negative set to keep only letters and numbers
df["words"] = df["words"].replace(re.compile("[^A-Za-z0-9]"), "")
display(df)

輸出：

    words
0   123
1   abcd
2   efgh
3   abcdefghijklmn
4   lol
5   Hornbk
6   1003
7   999

strip 有問題，替換 pandas 中的函數 dataframe

問題描述

1 個解決方案

解決方案1
0 2022-12-08 21:25:12

strip 有問題，替換 pandas 中的函數 dataframe

問題描述

1 個解決方案

解決方案1 0 2022-12-08 21:25:12

解決方案1
0 2022-12-08 21:25:12