[英]pandas: first two elements in a string column matches dictionary key
I have a dataframe as follows:我有一个 dataframe 如下:
import pandas as pd
df = pd.DataFrame({'data1':['the weather is nice today','This is interesting','the weather is good'],
'data2':['It is raining','The plant is green','the weather is sunny']})
and I have a dictionary as follows:我有一本字典如下:
my_dict = {'the weather':'today','the plant':'tree'}
I would like to replace the first two words in the data2 column if they are found in the dictionary key.如果在字典键中找到它们,我想替换 data2 列中的前两个单词。 I have done the following:
我做了以下事情:
for old, new in dic.items():
if pd.Series([' '.join(map(str, l)) for l in df['data2'].str.lower().str.split().map(lambda x: x[0:2])]).str.contains('|'.join(old.capitalize()).any():
df['data2'] = df['data2'].str.replace(old, new.capitalize(), regex=False)
else:
print('does not exist')
but when i print(df), nothing has been replaced.但是当我打印(df)时,没有任何东西被替换。
the expected output:预期的 output:
data1 data2
0 the weather is nice today It is raining
1 This is interesting Tree is green
2 the weather is good Today is sunny
If I understand correctly, this is one way to do it (there may be more efficient ways):如果我理解正确,这是一种方法(可能有更有效的方法):
df.data2 = df.data2.str.lower()
for k in my_dict:
df.data2 = df.data2.str[:len(k)].replace(k, my_dict[k]) + df.data2.str[len(k):]
df.data2 = df.data2.str.capitalize()
Lowercasing and capitalization weren't in your question but were part of your code, so I put them in (otherwise it would fail because the capitalization doesn't match in your sample code).小写和大写不在您的问题中,而是您代码的一部分,因此我将它们放入(否则它会失败,因为您的示例代码中的大写不匹配)。
The plant
and we are trying to compare it with the plant
without converting it to lower case.The plant
并且我们试图将它与the plant
进行比较而不将其转换为小写。 for old, new in my_dict.items():
if pd.Series([' '.join(map(str, l)) for l in df['data2'].str.lower().str.split().map(lambda x: x[0:2])]).str.contains('|'.join(old)).any():
df['data2'] = list(map(lambda x: x.lower().replace(old, new.capitalize()), df['data2']))
else:
print('does not exist')
You can try with pandas.Series.str.replace
您可以尝试使用
pandas.Series.str.replace
for key, val in my_dict.items():
df['data2'] = df['data2'].str.replace(f'^{key}', val, case=False, regex=True)
print(df)
data1 data2
0 the weather is nice today It is raining
1 This is interesting tree is green
2 the weather is good today is sunny
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.