I'm new to Python, and I'm trying to clean up a csv using Pandas.
My current dataframe looks like this:
Time Summary
0 10 ABC Company
1 4 Company XYZ
2 20 The Awesome Company
3 4 Record B
And I have a list that looks like:
clients = ['ABC', 'XYZ', 'Awesome']
The challenge I'm having is extracting values from the dataframe that equal any value in the list.
I'd like my dataframe to look like this:
Time Summary Client
0 10 ABC Company ABC
1 4 Company XYZ XYZ
2 20 The Awesome Company Awesome
3 4 Record B NaN
I've looked into regex, .any, and in, but I can't seem to get the syntax correct in the for loop.
You could do something like:
import numpy as np
def match_client(summary):
client_matches = [client for client in ['ABC', 'XYZ', 'Awesome'] if client in summary]
if len(client_matches) == 0:
return np.nan
else:
return ', '.join(client_matches)
df['Client'] = df['Summary'].map(match_client)
Just to complement @Simon's answer, if you want to apply it for different clients, you can pass the list of clients as an argument as well.
import numpy as np
def match_client(summary, clients):
client_matches = [client for client in clients if client in summary]
if len(client_matches) == 0:
return np.nan
else:
return ', '.join(client_matches)
clients = ['ABC', 'XYZ', 'Awesome']
df['Client'] = df['Summary'].map(lambda x: match_client(x, clients))
You only need to use the lambda function so you can pass clients
as an extra argument inside map
.
pandas.Series.str.extract
Assuming there is only one match
df.assign(Client=df.Summary.str.extract(f"({'|'.join(clients)})"))
Time Summary Client
0 10 ABC Company ABC
1 4 Company XYZ XYZ
2 20 The Awesome Company Awesome
3 4 Record B NaN
pandas.Series.str.findall
There might be more... You never know.
df.join(df.Summary.str.findall('|'.join(clients)).str.join('|').str.get_dummies())
Time Summary ABC Awesome XYZ
0 10 ABC Company 1 0 0
1 4 Company XYZ 0 0 1
2 20 The Awesome Company 0 1 0
3 4 Record B 0 0 0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.