繁体   English   中英

如何在具有 agg 的一列上使用 pandas groupby - 最大一列,最小另一列 - 不产生多级列

[英]How to use pandas groupby on one column with agg - max one col, min another col - without producing multi-level columns

我有以下 pandas DataFrame:

在此处输入图像描述

account_number = [1234, 5678, 9012, 1234.0, 5678, 9012, 1234.0, 5678, 9012, 1234.0, 5678, 9012]
client_name = ["Ford", "GM", "Honda", "Ford", "GM", "Honda", "Ford", "GM", "Honda", "Ford", "GM", "Honda"]
database = ["DB_Ford", "DB_GM", "DB_Honda", "DB_Ford", "DB_GM", "DB_Honda", "DB_Ford", "DB_GM", "DB_Honda", "DB_Ford", "DB_GM", "DB_Honda"]
server = ["L01SQL04", "L01SQL08", "L01SQL12", "L01SQL04", "L01SQL08", "L01SQL12", "L01SQL04", "L01SQL08", "L01SQL12", "L01SQL04", "L01SQL08", "L01SQL12"]
order_num = [2145479, 2145506, 2145534, 2145603, 2145658, 2429513, 2145489, 2145516, 2145544, 2145499, 2145526, 2145554]
customer_dob = ["1967-12-01", "1963-07-09", "1986-12-05", "1967-11-01", None, "1986-12-05", "1967-12-01", "1963-07-09", "1986-12-05", "1967-12-01", "1963-07-09", "1986-12-04"]
purchase_date = ["2022-06-18", "2022-04-11", "2021-01-18", "2022-06-20", "2022-04-11", "2021-01-18", "2022-06-22", "2022-04-13", "2021-01-18", "2022-06-24", "2022-04-18", "2021-01-18"]

d = {
    "account_number": account_number, 
    "client_name" : client_name,
    "database" : database,
    "server" : server,
    "order_num" : order_num,
    "customer_dob" : customer_dob,
    "purchase_date" : purchase_date,
}
df = pd.DataFrame(data=d)

dates = ["customer_dob", "purchase_date"]
for date in dates:
    df[date] = pd.to_datetime(df[date])

每个 account_number 客户的出生日期 (DOB) 和购买日期 (PD) 应该是相同的,但是由于任何一个都可能存在数据输入错误,我想对 account_number 执行 groupby 并获得 DOB 的最大值,以及 PD 上的最小值。 如果除了 account_number 之外我想要的只是这两列,这很容易做到:

在此处输入图像描述

result = df.groupby("account_number").agg({"customer_dob": "max", "purchase_date": "min"}).reset_index()
result

但是,我也想要其他列,因为它们保证对于每个 account_number 都是相同的。 问题是,当我尝试包含其他列时,我得到了我不想要的多级列。 第一次尝试不仅产生了多级列,而且我什至看不到 DOB 和 PD 的实际值

在此处输入图像描述

result = df.groupby("account_number")["client_name", "database", "server", "order_num"].agg({"customer_dob": "max", "purchase_date": "min"}).reset_index()
result

第二次尝试包括 DOB 和 PD,但现在每个帐号两次,同时仍生成多级列:

在此处输入图像描述

result = df.groupby("account_number")["client_name", "database", "server", "order_num", "customer_dob", "purchase_date"].agg(
    {"patient_dob": "max", "insert_date": "min"}).reset_index()
result

我只希望最终结果看起来像这样:

在此处输入图像描述

所以,这是我对你们所有 Python 专家的问题:我需要做什么才能做到这一点?

根据您上面的评论,留下订单号。 如果一个帐户的订单号相同,则将订单号添加到合并中的列列表中

result = df.groupby("account_number").agg({"customer_dob": "max", "purchase_date": "min"}).reset_index()

result.merge(df[['account_number','client_name','database','server' ]]  ,
            how='left',
            on='account_number').drop_duplicates()

    account_number  customer_dob    purchase_date   client_name     database    server
0           1234.0    1967-12-01       2022-06-18   Ford            DB_Ford     L01SQL04
4           5678.0    1963-07-09       2022-04-11   GM              DB_GM       L01SQL08
8           9012.0    1986-12-05       2021-01-18   Honda           DB_Honda    L01SQL12

您可以使用:

agg(new_col_1=(col_1, 'sum'), new_col_2=(col_2, 'min'))

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM