簡體   English   中英

替換數據框中的多個單詞

[英]replace multiple words in a dataframe

我想替換此處描述的單詞,但要替換數據框中的一列。 我還想在數據框中保留原始列和其他列。

a = ["isn't", "can't"]
b = ["is not", "cannot"]

for line in df['text']:
    for a1, b1 in zip(a, b):
        line = line.replace(a1, b1)
    df['text1'].write(line)

TypeError: expected str, bytes or os.PathLike object, not Series

輸入數據框

ID    text      
1     isn't bad
2     can't play

輸出

ID    text          text1
1     isn't bad     is not bad
2     can't play    cannot play

請幫忙。 謝謝你。

如果您有兩個列表ab ,那么這將是通過傳遞regex=True.replace值的最佳方法:

a = ["isn't", "can't"]
b = ["is not", "cannot"]
# df=pd.read_clipboard('\s\s+')
df['text1'] = df['text'].replace(a,b,regex=True)
df
Out[68]: 
   ID        text        text1
0   1   isn't bad   is not bad
1   2  can't play  cannot play

請注意ab的長度應該相同。 如果它只是一個小列表,這種技術很好,但如果它是一個更大的列表,您可能想要構建一個字典。

將數據框列上的apply方法與lambda函數結合使用,您可以實現這一點,如下所示:

import pandas as pd
a = ["isn't", "can't"]
b = ['is not', 'cannot']

df = pd.DataFrame({'id': [1,2], 'text': ["isn't bad", "can't play"]})
df['a'], df['b'] = a,b
print(df.head())

數據框如下所示:

   id        text      a       b
0   1   isn't bad  isn't  is not
1   2  can't play  can't  cannot

您現在可以像這樣對這個數據框進行apply

df['vals'] = pd.Series(map(lambda x,y,z: x.replace(y, z), list(df.text), list(df.a), list(df.b)))
print(df.head())

最終輸出:

   id        text      a       b         vals
0   1   isn't bad  isn't  is not   is not bad
1   2  can't play  can't  cannot  cannot play

您可以考慮使用vals列進行分析或僅提取所需的列。

好吧,您可以使用查找表來更改單詞;

將熊貓導入為 pd

dict = {
    'text':["isn't bad", "can't play"]
}
table = {
    "isn't":"is not",
    "can't":"cannot"
}

df = pd.DataFrame(dict)
revised_text = []
for text in dict['text']:
    words = text.split()
    for word in words:
        if word in table.keys():
            revised_text.append(text.replace(word, table[word]))

df['text1'] = revised_text
print(df)

這是一個選項。

df['text1'] = df['text']
for i in range(len(a)):
    df['text1'] = df['text1'].str.replace(a[i],b[i])

這是另一種不涉及迭代的方法。

replacedict = {"isn't":"is not",
          "can't":"cannot"}
text = df['text']
df = df.assign(text=df['text'].str.split(' ')).explode('text').replace(replacedict).groupby('id').agg({'text':lambda x: ' '.join(x)}).reset_index()
df['text1'] = df['text']
df['text'] = text

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM