pandas：用列表值字典中的鍵和值替換列值

Question

我有一個數據框和一個字典如下（但更大），

import pandas as pd
df = pd.DataFrame({'text': ['can you open the door?','shall you write the address?']})

dic = {'Should': ['can','could'], 'Could': ['shall'], 'Would': ['will']}

如果可以在 dic 值列表中找到它們，我想替換文本列中的單詞，所以我做了以下操作，它適用於具有一個值但不適用於另一個列表的列表，

for key, val in dic.items():
    if df['text'].str.lower().str.split().map(lambda x: x[0]).str.contains('|'.join(val)).any():
       df['text'] = df['text'].str.replace('|'.join(val), key, regex=False)
print(df)

我想要的輸出是，

              text
0   Should you open the door?
1  Could you write the address?

Answer 1

您可以在 flatten dictionary 中使用小寫以d表示鍵和值，然后用單詞邊界替換值並最后使用Series.str.capitalize ：

d = {x.lower(): k.lower() for k, v in dic.items() for x in v}


regex = '|'.join(r"\b{}\b".format(x) for x in d.keys())
df['text'] = (df['text'].str.lower()
                        .str.replace(regex, lambda x: d[x.group()], regex=True)
                        .str.capitalize())
print(df)
                           text
0     Should you open the door?
1  Could you write the address?

Answer 2

最好的辦法是改變邏輯，盡量減少 pandas 的步驟。

您可以制作一個直接包含理想輸出的字典：

dic2 = {v:k for k,l in dic.items() for v in l}
# {'can': 'Should', 'could': 'Should', 'shall': 'Could', 'will': 'Would'}

# or if not yet formatted:
# dic2 = {v.lower():k.capitalize() for k,l in dic.items() for v in l}

import re
regex = '|'.join(map(re.escape, dic2))

df['text'] = df['text'].str.replace(f'\b({regex})\b',
                                    lambda m: dic2.get(m.group()),
                                    case=False, # only if case doesn't matter
                                    regex=True)

輸出（為清楚起見作為 text2 列）：

                           text                         text2
0        can you open the door?     Should you open the door?
1  shall you write the address?  Could you write the address?

pandas：用列表值字典中的鍵和值替換列值

問題描述

2 個解決方案

解決方案1
1 2022-05-10 11:27:21

解決方案2
1 已采納 2022-05-10 11:41:51

pandas：用列表值字典中的鍵和值替換列值

問題描述

2 個解決方案

解決方案1 1 2022-05-10 11:27:21

解決方案2 1 已采納 2022-05-10 11:41:51

解決方案1
1 2022-05-10 11:27:21

解決方案2
1 已采納 2022-05-10 11:41:51