pandas：字符串列中的前两个元素与字典键匹配

Question

I have a dataframe as follows:我有一个 dataframe 如下：

import pandas as pd

df = pd.DataFrame({'data1':['the weather is nice today','This is interesting','the weather is good'],
             'data2':['It is raining','The plant is green','the weather is sunny']})

and I have a dictionary as follows:我有一本字典如下：

my_dict = {'the weather':'today','the plant':'tree'}

I would like to replace the first two words in the data2 column if they are found in the dictionary key.如果在字典键中找到它们，我想替换 data2 列中的前两个单词。 I have done the following:我做了以下事情：

for old, new in dic.items():    
    if pd.Series([' '.join(map(str, l)) for l in df['data2'].str.lower().str.split().map(lambda x: x[0:2])]).str.contains('|'.join(old.capitalize()).any():
       df['data2'] = df['data2'].str.replace(old, new.capitalize(), regex=False)
    else:
       print('does not exist')

but when i print(df), nothing has been replaced.但是当我打印（df）时，没有任何东西被替换。

the expected output:预期的 output：

                       data1                 data2
0  the weather is nice today         It is raining
1        This is interesting    Tree is green
2        the weather is good    Today is sunny

Answer 1

If I understand correctly, this is one way to do it (there may be more efficient ways):如果我理解正确，这是一种方法（可能有更有效的方法）：

df.data2 = df.data2.str.lower()
for k in my_dict:
  df.data2 = df.data2.str[:len(k)].replace(k, my_dict[k]) + df.data2.str[len(k):]

df.data2 = df.data2.str.capitalize()

Lowercasing and capitalization weren't in your question but were part of your code, so I put them in (otherwise it would fail because the capitalization doesn't match in your sample code).小写和大写不在您的问题中，而是您代码的一部分，因此我将它们放入（否则它会失败，因为您的示例代码中的大写不匹配）。

Answer 2

use python map function to go through the arrays通过 arrays 使用 python map function 到 go
in the dataframe we have like The plant and we are trying to compare it with the plant without converting it to lower case.在 dataframe 中，我们喜欢The plant并且我们试图将它与the plant进行比较而不将其转换为小写。

    for old, new in my_dict.items():    
    if pd.Series([' '.join(map(str, l)) for l in df['data2'].str.lower().str.split().map(lambda x: x[0:2])]).str.contains('|'.join(old)).any():
       df['data2'] = list(map(lambda x: x.lower().replace(old, new.capitalize()), df['data2']))
    else:
       print('does not exist')

Answer 3

You can try with pandas.Series.str.replace您可以尝试使用pandas.Series.str.replace

for key, val in my_dict.items():
    df['data2'] = df['data2'].str.replace(f'^{key}', val, case=False, regex=True)

print(df)

                       data1           data2
0  the weather is nice today   It is raining
1        This is interesting   tree is green
2        the weather is good  today is sunny

pandas：字符串列中的前两个元素与字典键匹配

问题描述

3 个解决方案

解决方案1
1 已采纳 2022-04-08 12:42:08

解决方案2
1 2022-04-08 12:45:21

解决方案3
1 2022-04-08 12:59:45

pandas：字符串列中的前两个元素与字典键匹配

问题描述

3 个解决方案

解决方案1 1 已采纳 2022-04-08 12:42:08

解决方案2 1 2022-04-08 12:45:21

解决方案3 1 2022-04-08 12:59:45

解决方案1
1 已采纳 2022-04-08 12:42:08

解决方案2
1 2022-04-08 12:45:21

解决方案3
1 2022-04-08 12:59:45