简体   繁体   中英

pandas: first two elements in a string column matches dictionary key

I have a dataframe as follows:

import pandas as pd

df = pd.DataFrame({'data1':['the weather is nice today','This is interesting','the weather is good'],
             'data2':['It is raining','The plant is green','the weather is sunny']})

and I have a dictionary as follows:

my_dict = {'the weather':'today','the plant':'tree'}

I would like to replace the first two words in the data2 column if they are found in the dictionary key. I have done the following:

for old, new in dic.items():    
    if pd.Series([' '.join(map(str, l)) for l in df['data2'].str.lower().str.split().map(lambda x: x[0:2])]).str.contains('|'.join(old.capitalize()).any():
       df['data2'] = df['data2'].str.replace(old, new.capitalize(), regex=False)
    else:
       print('does not exist')

but when i print(df), nothing has been replaced.

the expected output:

                       data1                 data2
0  the weather is nice today         It is raining
1        This is interesting    Tree is green
2        the weather is good    Today is sunny

If I understand correctly, this is one way to do it (there may be more efficient ways):

df.data2 = df.data2.str.lower()
for k in my_dict:
  df.data2 = df.data2.str[:len(k)].replace(k, my_dict[k]) + df.data2.str[len(k):]

df.data2 = df.data2.str.capitalize()

Lowercasing and capitalization weren't in your question but were part of your code, so I put them in (otherwise it would fail because the capitalization doesn't match in your sample code).

  1. use python map function to go through the arrays
  2. in the dataframe we have like The plant and we are trying to compare it with the plant without converting it to lower case.
    for old, new in my_dict.items():    
    if pd.Series([' '.join(map(str, l)) for l in df['data2'].str.lower().str.split().map(lambda x: x[0:2])]).str.contains('|'.join(old)).any():
       df['data2'] = list(map(lambda x: x.lower().replace(old, new.capitalize()), df['data2']))
    else:
       print('does not exist')

You can try with pandas.Series.str.replace

for key, val in my_dict.items():
    df['data2'] = df['data2'].str.replace(f'^{key}', val, case=False, regex=True)
print(df)

                       data1           data2
0  the weather is nice today   It is raining
1        This is interesting   tree is green
2        the weather is good  today is sunny

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM