[英]Replace words in a string from a dataframe with words from a separate dataframe
I have the following dataset: 我有以下数据集:
Date User comments
9/20/2019 user1 My car model is 600.
9/21/2019 user2 My car model is viper.
9/23/2019 user3 I have a car. The model is civic.
9/23/2019 user4 Washington is name of the city.
9/23/2019 user5 I like the freedom I feel when I drive my chevy.
These are sample comments that were scrapped.这些是被废弃的示例评论。 I'm trying to use this dataframe:我正在尝试使用此数据框:
Brand Model
ford 600
chevrolet chevy
dodge viper
honda civic
pontiac gto
honda freed
I am trying to replace the model described in the comment on the dataframe with the brand.我正在尝试用品牌替换数据框评论中描述的模型。
Here is my code:这是我的代码:
file = pd.read_csv('test_dataset.csv')
file['comments'] = file['comments'].astype(str)
file["comments"] = file["comments"].str.lower()
brandconverter = pd.read_csv("brandconverter.csv")
def replacemodel(comment):
return pd.Series(comment).replace(brandconverter.set_index('Model')['Brand'], regex=True)[0]
file['test'] = file['comments'].apply(replacemodel)
My expected output should be:我的预期输出应该是:
Date User comments test
9/20/2019 user1 My car model is 600. My car model is ford.
9/21/2019 user2 My car model is viper. My car model is dodge.
9/23/2019 user3 I have a car. The model is civic. I have a car. The model is honda.
9/23/2019 user4 Washington is name of the city. Washington is name of the city.
But the output I am getting is:但我得到的输出是:
Date User comments test
9/20/2019 user1 My car model is 600. My car model is ford.
9/21/2019 user2 My car model is viper. My car model is dodge.
9/23/2019 user3 I have a car. The model is civic. I have a car. The model is honda.
9/23/2019 user4 Washington is name of the city. Washinpontiacn is name of the city.
I would like my function to ignore when the car model is inside a word like in 'Washington'.当汽车模型在“华盛顿”这样的词中时,我希望我的函数忽略。 At the moment, it is looking for any case where the model is present in the comment even if it is inside a word.目前,它正在寻找模型出现在评论中的任何情况,即使它在一个词中。 I would like the function to not consider the 'gto' in 'Washington'.我希望该功能不考虑“华盛顿”中的“gto”。 I was hoping to apply this function to different comments too.我也希望将此功能应用于不同的评论。 This is just a sample.这只是一个示例。
You can use Series.replace
with optional parameter regex=True
to replace the model in comments
with the corresponding brand from df2
:您可以使用带有可选参数regex=True
Series.replace
将comments
的模型替换为df2
的相应品牌:
s = brandconverter.set_index('Model')['Brand']
s.index = r'\b' + s.index + r'\b' # Takes care of word boundary condition
file['test'] = file['comments'].replace(s, regex=True)
Result:结果:
Date User comments test
0 9/20/2019 user1 My car model is 600. My car model is ford.
1 9/21/2019 user2 My car model is viper. My car model is dodge.
2 9/23/2019 user3 I have a car. The model is civic. I have a car. The model is honda.
3 9/23/2019 user4 Washington is name of the city. Washington is name of the city.
You can use the following:您可以使用以下内容:
ids = {'from':['ford','chevrolet','dodge'],
'to':['600','chevy','viper']}
ids = dict(zip(ids['from'], ids['to']))
df['test'] = df['comments'].replace(ids, regex=True)
You could try using your brandconverter dataframe as a dictionary, and then loop through it, this example doesn't have a loop but the key variable could easily be just an iterator:您可以尝试使用您的 brandconverter 数据帧作为字典,然后循环遍历它,这个例子没有循环,但关键变量很容易只是一个迭代器:
import pandas as pd
file = pd.DataFrame({'User': ['user1', 'user2', 'user3', 'user4'],
'comments': ['A gto is what I love',
'A gtoto is what I love',
'Washington is name of the city.',
'My car model is a gto.']})
brandconverter = {'gto': 'pontiac'}
key = 'gto'
file['test'] = file['comments'].replace(f'\\b{key}\\b', brandconverter[key], regex=True)
print(repr(file))
This prints out:这打印出来:
User comments test
0 user1 A gto is what I love A pontiac is what I love
1 user2 A gtoto is what I love A gtoto is what I love
2 user3 Washington is name of the city. Washington is name of the city.
3 user4 My car model is a gto. My car model is a pontiac.
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.