简体   繁体   English

熊猫:在每一行中按其第一次出现和最后一次出现

[英]Pandas: Fill every row by its first and last occurrence

My data includes invoices and customers. 我的数据包括发票和客户。 One customer can have multiple invoices. 一位客户可以拥有多张发票。 One invoice belongs to always one customer. 一张发票始终属于一位客户。 The invoices are updated daily (Report Date). 发票每天更新(报告日期)。

My goal is to calculate the age of the customer in days (see column "Age in Days"). 我的目标是以天为单位计算客户的年龄(请参阅“天数”列)。 In order to achieve this, I take the first occurrence of a customers report date and calculate the difference to the last occurrence of the report date. 为了实现这一点,我采用了客户报告日期的第一个值,并计算了与报告日期最后一次的值的差。

eg Customer 1 occurs from 08-14 till 08-15. 例如,客户1从08-14到08-15发生。 Therefore he/she is 1 day old. 因此,他/她只有1天大。

Report Date  Invoice No   Customer No  Amount  Age in Days
2018-08-14   A            1            50$     1
2018-08-14   B            1            100$    1
2018-08-14   C            2            75$     2

2018-08-15   A            1            20$     1
2018-08-15   B            1            45$     1
2018-08-15   C            2            70$     2

2018-08-16   C            2            40$     1
2018-08-16   D            3            100$    0
2018-08-16   E            3            60$     0

I solved this, but however, very inefficiently and it takes too long. 我解决了这个问题,但是效率很低,而且花费的时间太长。 My data contains 26 million rows. 我的数据包含2600万行。 Below I calculated the age for one customer only. 下面我仅计算了一位客户的年龄。

# List every customer no
customerNo = df["Customer No"].unique()
customer_age = []

# Testing for one specific customer
testCustomer = df.loc[df["Customer No"] == customerNo[0]]
testCustomer = testCustomer.sort_values(by="Report Date", ascending=True)

first_occur = testCustomer.iloc[0]['Report Date']
last_occur = testCustomer.iloc[-1]['Report Date']
age = (last_occur - first_occur).days

customer_age.extend([age] * len(testCustomer))
testCustomer.loc[:,'Customer Age']=customer_age 

Is there a better way to solve this problem? 有解决这个问题的更好方法吗?

If you need one value per customer, indicating its age you can use a group by(very common): 如果您需要每个客户一个价值,说明其年龄,则可以使用分组依据(非常常见):

grpd = my_df.groupby('Customer No')['Report Date'].agg([min, max]).reset_index()
grpd['days_diff'] = (grpd['max'] - grpd['min']).dt.days

Use groupby.transform with first and last aggregations: groupby.transformfirstlast聚合一起使用:

grps = df.groupby('Customer No')['Report Date']    
df['Age in Days'] = (grps.transform('last') - grps.transform('first')).dt.days

[out] [出]

  Report Date Invoice No  Customer No Amount  Age in Days
0  2018-08-14          A            1    50$            1
1  2018-08-14          B            1   100$            1
2  2018-08-14          C            2    75$            2
3  2018-08-15          A            1    20$            1
4  2018-08-15          B            1    45$            1
5  2018-08-15          C            2    70$            2
6  2018-08-16          C            2    40$            2
7  2018-08-16          D            3   100$            0
8  2018-08-16          E            3    60$            0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 分组并减去熊猫中的第一次出现和最后一次出现 - group by and subtract first occurrence and last occurrence in pandas 在pandas中的每个组的第一个位置和最后一个位置之后添加行 - Add row before first and after last position of every group in pandas 在熊猫数据帧中的每个时间序列的第一次出现之前和最后一次出现之后对 NaN 值进行切片 - Slicing NaN values before first and after last occurrence for every time series in a pandas dataframe 如何在熊猫中获取第一次和最后一次出现的项目 - How to obtain first and last occurrence of an item in pandas 熊猫-基于条件的首次出现的下降行 - pandas - drop row based on first occurrence of condition 用单个列表熊猫填充DataFrame的每一行 - fill every row of a DataFrame with a single list pandas 熊猫:将DataFrame的最后一行除以第一行 - pandas: Divide DataFrame last row by first row Python Pandas - 自上次出现200万行数据帧以来的分钟数 - Python Pandas - Minutes since last occurrence in 2 million row dataframe 从 pandas 系列中的每一行中提取某个字符串的最后一次出现 - Extracting last occurrence of a certain string from each row in a pandas series 在 3+ 列 dataframe 中的 pandas 中的每个组的第一个和最后一个 position 之前和之后添加行 - Add row before first and after last position of every group in pandas in 3+ columns dataframe
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM