简体   繁体   中英

How to compare two cols' str elements in dataframes

Currently, I have a dataframe, I want to compare two cols.

col_1 is words, col_2 is a phrase. I want to know, if two words (in col_1) appear in col_2, then change two single words to be a phrase.

here is the dataframe

list1 = [['good', 'hello', 'morning',],['sit', 'good', 'down'],['get', 'who', 'down']]

list2 = [['good morning', 'good afternoon'],['sit down', 'rise up', 'good work'], ['sit here', 'get job', 'get down']]

df_new = pd.DataFrame({'words': list1})

df_new['para'] = list2

I want to get the result is: look like:

list3 = [['good morning', 'hello'],['sit down', 'good'],['get down', 'who']]
list4 = [['good afternoon'],['rise up', 'good work'], ['get job', 'get down']]
df_new['result1'] = list3
df_new['result2'] = list4
  • result1: if words appear in phrase, change the two words to one phrase.
  • result2: delete the phrase in result1, the remaining phrase in result2.

Any suggestion for how to get :result1,2. I will really grateful for the logic if you could help me! thx for that.

Try to use only list, dtaframe will complicate the task. As for the logic, this is how I would proceed: Using itertools you get every possible pair for your set of word which you compare to your sentences. If it's a match you upload your different lists

import itertools

list1 = [['good', 'hello', 'morning',],['sit', 'good', 'down'],['get', 'who', 'down']]

list2 = [['good morning', 'good afternoon'],['sit down', 'rise up', 'good work'], ['sit here', 'get job', 'get down']]

def possible_pair(list):
    n = len(list)
    possible_pairs = itertools.permutations(list, r=2)
    return possible_pairs

for i,words in enumerate(list1):
    for pair in possible_pair(words):
        sentence = pair[0]+' '+pair[1]
        print(sentence)
        if sentence in list2[i]:
            list2[i].remove(sentence)
            list1[i].append(sentence)
            list1[i].remove(pair[0])
            list1[i].remove(pair[1])

print(list1, list2)

output:

[['hello', 'good morning'], ['good', 'sit down'], ['who', 'get down']] [['good afternoon'], ['rise up', 'good work'], ['sit here', 'get job']]

Once done you can eventually rebuild your df

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM