用来自单独数据帧的单词替换来自数据帧的字符串中的单词

Question

我有以下数据集：

Date       User     comments
9/20/2019 user1    My car model is 600.
9/21/2019 user2    My car model is viper.
9/23/2019 user3    I have a car. The model is civic. 
9/23/2019 user4    Washington is name of the city. 
9/23/2019 user5    I like the freedom I feel when I drive my chevy.

这些是被废弃的示例评论。 我正在尝试使用此数据框：

Brand     Model
ford       600
chevrolet chevy
dodge     viper
honda     civic
pontiac    gto
honda     freed

我正在尝试用品牌替换数据框评论中描述的模型。

这是我的代码：

file = pd.read_csv('test_dataset.csv')
file['comments'] = file['comments'].astype(str)
file["comments"] = file["comments"].str.lower()
brandconverter = pd.read_csv("brandconverter.csv")
def replacemodel(comment):
    return pd.Series(comment).replace(brandconverter.set_index('Model')['Brand'], regex=True)[0]

file['test'] = file['comments'].apply(replacemodel)

我的预期输出应该是：

     Date      User     comments                              test
    9/20/2019 user1    My car model is 600.               My car model is ford. 
    9/21/2019 user2    My car model is viper.             My car model is dodge.
    9/23/2019 user3    I have a car. The model is civic.  I have a car. The model is honda. 
    9/23/2019 user4    Washington is name of the city.    Washington is name of the city.

但我得到的输出是：

     Date      User     comments                              test
    9/20/2019 user1    My car model is 600.               My car model is ford. 
    9/21/2019 user2    My car model is viper.             My car model is dodge.
    9/23/2019 user3    I have a car. The model is civic.  I have a car. The model is honda. 
    9/23/2019 user4    Washington is name of the city.    Washinpontiacn is name of the city.

当汽车模型在“华盛顿”这样的词中时，我希望我的函数忽略。 目前，它正在寻找模型出现在评论中的任何情况，即使它在一个词中。 我希望该功能不考虑“华盛顿”中的“gto”。 我也希望将此功能应用于不同的评论。 这只是一个示例。

Answer 1

您可以使用带有可选参数regex=True Series.replace将comments的模型替换为df2的相应品牌：

s = brandconverter.set_index('Model')['Brand']
s.index = r'\b' + s.index + r'\b' # Takes care of word boundary condition

file['test'] = file['comments'].replace(s, regex=True)

结果：

       Date   User                           comments                               test
0  9/20/2019  user1               My car model is 600.              My car model is ford.
1  9/21/2019  user2             My car model is viper.             My car model is dodge.
2  9/23/2019  user3  I have a car. The model is civic.  I have a car. The model is honda.
3  9/23/2019  user4    Washington is name of the city.    Washington is name of the city.

Answer 2

您可以使用以下内容：

ids = {'from':['ford','chevrolet','dodge'],
      'to':['600','chevy','viper']}
ids = dict(zip(ids['from'], ids['to']))
df['test'] = df['comments'].replace(ids, regex=True)

Answer 3

您可以尝试使用您的 brandconverter 数据帧作为字典，然后循环遍历它，这个例子没有循环，但关键变量很容易只是一个迭代器：

import pandas as pd

file = pd.DataFrame({'User': ['user1', 'user2', 'user3', 'user4'],
                     'comments': ['A gto is what I love',
                                  'A gtoto is what I love',
                                  'Washington is name of the city.',
                                  'My car model is a gto.']})

brandconverter = {'gto': 'pontiac'}



key = 'gto'
file['test'] = file['comments'].replace(f'\\b{key}\\b', brandconverter[key], regex=True)

print(repr(file))

这打印出来：

    User                         comments                             test
0  user1             A gto is what I love         A pontiac is what I love
1  user2           A gtoto is what I love           A gtoto is what I love
2  user3  Washington is name of the city.  Washington is name of the city.
3  user4           My car model is a gto.       My car model is a pontiac.

用来自单独数据帧的单词替换来自数据帧的字符串中的单词

问题描述

3 个解决方案

解决方案1
1 已采纳 2020-09-08 18:44:19

解决方案2
0 2020-09-08 18:44:56

解决方案3
0 2020-09-08 19:27:16

用来自单独数据帧的单词替换来自数据帧的字符串中的单词

问题描述

3 个解决方案

解决方案1 1 已采纳 2020-09-08 18:44:19

解决方案2 0 2020-09-08 18:44:56

解决方案3 0 2020-09-08 19:27:16

解决方案1
1 已采纳 2020-09-08 18:44:19

解决方案2
0 2020-09-08 18:44:56

解决方案3
0 2020-09-08 19:27:16