替換數據框中的多個單詞

Question

我想替換此處描述的單詞，但要替換數據框中的一列。 我還想在數據框中保留原始列和其他列。

a = ["isn't", "can't"]
b = ["is not", "cannot"]

for line in df['text']:
    for a1, b1 in zip(a, b):
        line = line.replace(a1, b1)
    df['text1'].write(line)

TypeError: expected str, bytes or os.PathLike object, not Series

輸入數據框

ID    text      
1     isn't bad
2     can't play

輸出

ID    text          text1
1     isn't bad     is not bad
2     can't play    cannot play

請幫忙。 謝謝你。

Answer 1

如果您有兩個列表a和b ，那么這將是通過傳遞regex=True來.replace值的最佳方法：

a = ["isn't", "can't"]
b = ["is not", "cannot"]
# df=pd.read_clipboard('\s\s+')
df['text1'] = df['text'].replace(a,b,regex=True)
df
Out[68]: 
   ID        text        text1
0   1   isn't bad   is not bad
1   2  can't play  cannot play

請注意a和b的長度應該相同。 如果它只是一個小列表，這種技術很好，但如果它是一個更大的列表，您可能想要構建一個字典。

Answer 2

將數據框列上的apply方法與lambda函數結合使用，您可以實現這一點，如下所示：

import pandas as pd
a = ["isn't", "can't"]
b = ['is not', 'cannot']

df = pd.DataFrame({'id': [1,2], 'text': ["isn't bad", "can't play"]})
df['a'], df['b'] = a,b
print(df.head())

數據框如下所示：

   id        text      a       b
0   1   isn't bad  isn't  is not
1   2  can't play  can't  cannot

您現在可以像這樣對這個數據框進行apply ：

df['vals'] = pd.Series(map(lambda x,y,z: x.replace(y, z), list(df.text), list(df.a), list(df.b)))
print(df.head())

最終輸出：

   id        text      a       b         vals
0   1   isn't bad  isn't  is not   is not bad
1   2  can't play  can't  cannot  cannot play

您可以考慮使用vals列進行分析或僅提取所需的列。

Answer 3

好吧，您可以使用查找表來更改單詞；

將熊貓導入為 pd

dict = {
    'text':["isn't bad", "can't play"]
}
table = {
    "isn't":"is not",
    "can't":"cannot"
}

df = pd.DataFrame(dict)
revised_text = []
for text in dict['text']:
    words = text.split()
    for word in words:
        if word in table.keys():
            revised_text.append(text.replace(word, table[word]))

df['text1'] = revised_text
print(df)

Answer 4

這是一個選項。

df['text1'] = df['text']
for i in range(len(a)):
    df['text1'] = df['text1'].str.replace(a[i],b[i])

這是另一種不涉及迭代的方法。

replacedict = {"isn't":"is not",
          "can't":"cannot"}
text = df['text']
df = df.assign(text=df['text'].str.split(' ')).explode('text').replace(replacedict).groupby('id').agg({'text':lambda x: ' '.join(x)}).reset_index()
df['text1'] = df['text']
df['text'] = text

替換數據框中的多個單詞

問題描述

4 個解決方案

解決方案1
3 已采納 2020-09-13 23:21:13

解決方案2
2 2020-09-13 23:31:05

解決方案3
0 2020-09-13 23:45:12

解決方案4
0 2020-09-13 23:53:17

替換數據框中的多個單詞

問題描述

4 個解決方案

解決方案1 3 已采納 2020-09-13 23:21:13

解決方案2 2 2020-09-13 23:31:05

解決方案3 0 2020-09-13 23:45:12

解決方案4 0 2020-09-13 23:53:17

解決方案1
3 已采納 2020-09-13 23:21:13

解決方案2
2 2020-09-13 23:31:05

解決方案3
0 2020-09-13 23:45:12

解決方案4
0 2020-09-13 23:53:17