简体   繁体   English

从 Pandas 数据框中获取每个客户的最新数据

[英]Get the latest data for each customer from pandas dataframe

I am trying to get the latest data for every customer regardless of other attributes in the dataframe.无论数据框中的其他属性如何,我都试图为每个客户获取最新数据。

My dataframe looks like this我的数据框看起来像这样

在此处输入图片说明

My output should look like this我的输出应该是这样的

我的输出应该是这样的

I have tried 'df.iloc[df.groupby('customer')['date'].idxmax()]' but I am getting ValueError.我试过 'df.iloc[df.groupby('customer')['date'].idxmax()]' 但我得到了 ValueError。

"ValueError Traceback (most recent call last) in ----> 1 df = df.iloc[df.groupby('cutomer')['date'].idxmax()] ----> 1 df = df.iloc[df.groupby('cutomer')['date'].idxmax()]

~\\Anaconda3\\envs\\myenv\\lib\\site-packages\\pandas\\core\\groupby\\groupby.py in wrapper(*args, **kwargs) 653 if self.obj.ndim == 1: 654 # this can be called recursively, so need to raise ValueError --> 655 raise ValueError 656 657 # GH#3688 try to operate item-by-item ~\\Anaconda3\\envs\\myenv\\lib\\site-packages\\pandas\\core\\groupby\\groupby.py in wrapper(*args, **kwargs) 653 if self.obj.ndim == 1: 654 # 这个可以调用递归,所以需要提高 ValueError --> 655 raise ValueError 656 657 # GH#3688 尝试逐项操作

ValueError: "值错误:“

I think it's really the same as this one: similar problem我觉得真的和这个一样: 类似的问题
In this case the code would look like this:在这种情况下,代码将如下所示:

df['date'] = pd.to_datetime(df.date)
idx = df.groupby('Customer')['date'].transform(max) == df['date']
df[idx] 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM