简体   繁体   中英

Looping through columns names values and find the exact match against a list of strings in another column

This example df6:

df6 = pd.DataFrame({
                   'answer1': ['Lo', 'New York', 'Toronto'],
                   'answer2': ['London', 'New', 'Paris'],
                   'answer3': ['CA', 'CA', 'CA'],
                   'correct': [["'London'","'CA'"], ["'New York'"], ["'Toronto'"]]
                   })

df6 gives:

   answer1   answer2     answer3     correct
0   Lo         London         CA    ['London','CA']
1   New York    New           CA    ['New York']
2   Toronto    Paris          CA    ['Toronto']

What I am trying to do is compare the exact answer from columns have the name "answer" with the correct column and if both match, the column name will be added to a new column.

My issue is I cannot find the right approach to do so.

This what I have tried even with the help with other SO members:

cols = df6.filter(like='answer').columns

df6['Answer'] = df6[cols].apply(lambda s: ', '.join(cols[(m:=[str(s[col]) in str(df6.loc[s.name, 'correct']) for col in cols])]) , axis=1)

This code gives:

    answer1    answer2     answer3     correct               Answer
0   Lo         London       CA         ['London','CA']        answer1, answer2,answer3
1   New York   New          CA         ['New York']           answer1, answer2
2   Toronto    Paris        CA         ['Toronto']            answer1

The result in row 0 is not accurate because the Answer column should have only answer2, answer3 which match ['London','CA'].. same also for row 1

so in comparing results I think I should use == instead of in :

[str(s[col]) in str(df6.loc[s.name, 'correct']

Please note that I needed to wrap all in str to overcome dtype issues while matching

The desired output:

   answer1    answer2     answer3     correct                 Answer
0   Lo         London       CA         ['London','CA']        answer2, answer3
1   New York   New          CA         ['New York']           answer1
2   Toronto    Paris        CA         ['Toronto']            answer1

Flatten your dataframe to compare one answer to one correct answer:

df6['Answer'] = df6.explode('correct').melt('correct', ignore_index=False) \
                   .query('value == correct').groupby(level=0)['variable'] \
                   .apply(', '.join)
print(df6)

# Output
    answer1 answer2 answer3       correct            Answer
0        Lo  London      CA  [London, CA]  answer2, answer3
1  New York     New      CA    [New York]           answer1
2   Toronto   Paris      CA     [Toronto]           answer1

After explode_melt , your dataframe looks like:

>>> df6.explode('correct').melt('correct', ignore_index=False).sort_index()

    correct variable     value
0    London  answer1        Lo
0        CA  answer1        Lo
0    London  answer2    London
0        CA  answer2    London
0    London  answer3        CA
0        CA  answer3        CA
1  New York  answer1  New York
1  New York  answer2       New
1  New York  answer3        CA
2   Toronto  answer1   Toronto
2   Toronto  answer2     Paris
2   Toronto  answer3        CA

Assuming, the correct column is a list of string. From your setup of df6 , I used pd.eval to normalize the column:

df6['correct'] = pd.eval(df6['correct'])

Try this:

import pandas as pd

df6 = pd.DataFrame({
                   'answer1': ['Lo', 'New York', 'Toronto'],
                   'answer2': ['London', 'New', 'Paris'],
                   'answer3': ['CA', 'CA', 'CA'],
                   'correct': [["'London'","'CA'"], ["'New York'"], ["'Toronto'"]]
                   })

cols = df6.filter(like='answer').columns

def find_answers(answers, l):
    ans = []
    for a, c in zip(answers, cols):
        print(l)
        if "'" + a + "'" in l:
            ans.append(c)
    return ', '.join(ans)


df6['Answer'] = df6.apply(lambda x: find_answers([x.answer1, x.answer2, x.answer3], x.correct), axis=1)

print(df6)

This will also work:

import pandas as pd

df6 = pd.DataFrame({
                   'answer1': ['Lo', 'New York', 'Toronto'],
                   'answer2': ['London', 'New', 'Paris'],
                   'answer3': ['CA', 'CA', 'CA'],
                   'correct': [['London','CA'], ['New York'], ['Toronto']]
                   })

correct_answer_column = []
for i, row in df6.iterrows():
    correct_answer = []
    if row['answer1'] in row['correct']:
        correct_answer.append('answer1')
    if row['answer2'] in row['correct']:
        correct_answer.append('answer2')
    if row['answer3'] in row['correct']:
        correct_answer.append('answer3')
    correct_answer_column.append(correct_answer)


correct_answer_column = pd.DataFrame(correct_answer_column)
final = pd.concat([df6, correct_answer_column], axis=1)
print(final)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM