简体   繁体   English

如何将列表值与不完全相等的数据框列进行比较?

[英]How do I compare list values to a dataframe column that are not exactly equal?

I'm new to Python, and I'm trying to clean up a csv using Pandas. 我是Python的新手,正在尝试使用Pandas清理csv。

My current dataframe looks like this: 我当前的数据框如下所示:

   Time   Summary
0  10     ABC Company
1  4      Company XYZ
2  20     The Awesome Company
3  4      Record B

And I have a list that looks like: 我有一个看起来像这样的列表:

clients = ['ABC', 'XYZ', 'Awesome']

The challenge I'm having is extracting values from the dataframe that equal any value in the list. 我面临的挑战是从数据框中提取等于列表中任何值的值。

I'd like my dataframe to look like this: 我希望我的数据框看起来像这样:

   Time   Summary              Client
0  10     ABC Company          ABC
1  4      Company XYZ          XYZ
2  20     The Awesome Company  Awesome
3  4      Record B             NaN

I've looked into regex, .any, and in, but I can't seem to get the syntax correct in the for loop. 我已经研究了正则表达式,.any和in,但是我似乎无法在for循环中获得正确的语法。

You could do something like: 您可以执行以下操作:

import numpy as np


def match_client(summary):
    client_matches = [client for client in ['ABC', 'XYZ', 'Awesome'] if client in summary]
    if len(client_matches) == 0:
        return np.nan
    else:
        return ', '.join(client_matches)

df['Client'] = df['Summary'].map(match_client)

Just to complement @Simon's answer, if you want to apply it for different clients, you can pass the list of clients as an argument as well. 只是为了补充@Simon的答案,如果要将其应用于其他客户,也可以将客户列表作为参数传递。

import numpy as np

def match_client(summary, clients):
    client_matches = [client for client in clients if client in summary]
    if len(client_matches) == 0:
        return np.nan
    else:
        return ', '.join(client_matches)

clients = ['ABC', 'XYZ', 'Awesome']
df['Client'] = df['Summary'].map(lambda x: match_client(x, clients))

You only need to use the lambda function so you can pass clients as an extra argument inside map . 您只需要使用lambda函数,即可将clients作为map内部的额外参数传递。

pandas.Series.str.extract

Assuming there is only one match 假设只有一场比赛

df.assign(Client=df.Summary.str.extract(f"({'|'.join(clients)})"))

   Time              Summary   Client
0    10          ABC Company      ABC
1     4          Company XYZ      XYZ
2    20  The Awesome Company  Awesome
3     4             Record B      NaN

pandas.Series.str.findall

There might be more... You never know. 可能还有更多……您永远不会知道。

df.join(df.Summary.str.findall('|'.join(clients)).str.join('|').str.get_dummies())

   Time              Summary  ABC  Awesome  XYZ
0    10          ABC Company    1        0    0
1     4          Company XYZ    0        0    1
2    20  The Awesome Company    0        1    0
3     4             Record B    0        0    0

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何检查 pandas dataframe 列中的所有值是否相等? - How do I check if all values in a column of a pandas dataframe are equal? 如何检查数据框中的一列是否与另一个数据框中的一列完全相等 - How to check if one column in a dataframe is exactly equal to a column in another dataframe 如何将 pandas dataframe 列中的时间值与定义的时间值进行比较? - How do I compare Time values in a pandas dataframe column to a defined time values? 如何根据 Python 中的列值将 pandas dataframe 单元格设置为等于? - How do I set pandas dataframe cells equal to based on column values in Python? 如何将Pandas数据框值与列表进行比较,并将结果放在新列中 - How to compare Pandas dataframe values with a list, and put the result in a new column 在 Pandas 中,如何同时将 dataframe 中的值与其行和列中的其他值进行比较? - In Pandas, how do I compare values in a dataframe with others in its row and column at the same time? 如何基于将现有列值与值列表匹配来简洁地创建新的 dataframe 列? - how do I succinctly create a new dataframe column based on matching existing column values with list of values? 如何根据 pandas dataframe 中另一列的多个值在一列中创建值列表? - How do I create a list of values in a column from several values from another column in a pandas dataframe? 如何比较嵌套列表中的值? - How do I compare values in a nested list? Pandas Dataframe:如何比较一行的两列中的值是否等于后续行的同一列中的值? - Pandas Dataframe: how can i compare values in two columns of a row are equal to the ones in the same columns of a subsequent row?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM