How do I compare list values to a dataframe column that are not exactly equal?

Question

I'm new to Python, and I'm trying to clean up a csv using Pandas.

My current dataframe looks like this:

   Time   Summary
0  10     ABC Company
1  4      Company XYZ
2  20     The Awesome Company
3  4      Record B

And I have a list that looks like:

clients = ['ABC', 'XYZ', 'Awesome']

The challenge I'm having is extracting values from the dataframe that equal any value in the list.

I'd like my dataframe to look like this:

   Time   Summary              Client
0  10     ABC Company          ABC
1  4      Company XYZ          XYZ
2  20     The Awesome Company  Awesome
3  4      Record B             NaN

I've looked into regex, .any, and in, but I can't seem to get the syntax correct in the for loop.

Answer 1

You could do something like:

import numpy as np


def match_client(summary):
    client_matches = [client for client in ['ABC', 'XYZ', 'Awesome'] if client in summary]
    if len(client_matches) == 0:
        return np.nan
    else:
        return ', '.join(client_matches)

df['Client'] = df['Summary'].map(match_client)

Answer 2

Just to complement @Simon's answer, if you want to apply it for different clients, you can pass the list of clients as an argument as well.

import numpy as np

def match_client(summary, clients):
    client_matches = [client for client in clients if client in summary]
    if len(client_matches) == 0:
        return np.nan
    else:
        return ', '.join(client_matches)

clients = ['ABC', 'XYZ', 'Awesome']
df['Client'] = df['Summary'].map(lambda x: match_client(x, clients))

You only need to use the lambda function so you can pass clients as an extra argument inside map .

Answer 3

`pandas.Series.str.extract`

Assuming there is only one match

df.assign(Client=df.Summary.str.extract(f"({'|'.join(clients)})"))

   Time              Summary   Client
0    10          ABC Company      ABC
1     4          Company XYZ      XYZ
2    20  The Awesome Company  Awesome
3     4             Record B      NaN

`pandas.Series.str.findall`

There might be more... You never know.

df.join(df.Summary.str.findall('|'.join(clients)).str.join('|').str.get_dummies())

   Time              Summary  ABC  Awesome  XYZ
0    10          ABC Company    1        0    0
1     4          Company XYZ    0        0    1
2    20  The Awesome Company    0        1    0
3     4             Record B    0        0    0

How do I compare list values to a dataframe column that are not exactly equal?

Question

3 answers

solution1
2 ACCPTED 2019-05-24 19:20:12

solution2
0 2019-05-24 20:14:36

solution3
0 2019-05-24 21:17:31

`pandas.Series.str.extract`

`pandas.Series.str.findall`

How do I compare list values to a dataframe column that are not exactly equal?

Question

3 answers

solution1 2 ACCPTED 2019-05-24 19:20:12

solution2 0 2019-05-24 20:14:36

solution3 0 2019-05-24 21:17:31

pandas.Series.str.extract

pandas.Series.str.findall

solution1
2 ACCPTED 2019-05-24 19:20:12

solution2
0 2019-05-24 20:14:36

solution3
0 2019-05-24 21:17:31

`pandas.Series.str.extract`

`pandas.Series.str.findall`