[英]Calculate value difference using min and max dates
I am trying to calculate value growth/decline using the minimum date and maximum date.我正在尝试使用最小日期和最大日期来计算价值增长/下降。 My data currently looks like this:
我的数据目前如下所示:
Code Date Value 0 A 2020-12-31 80122.0 1 A 2019-12-31 45472.0 2 A 2018-12-31 31917.0 3 A 2017-12-31 23432.0 4 B 2020-12-31 0.0
For Code AI need to keep the max (2020-12-31) and min (2017-12-31) dates as well as the values so I can calculate the difference later on.对于 Code AI,需要保留最大 (2020-12-31) 和最小 (2017-12-31) 日期以及值,以便稍后计算差异。 I have multiple codes and need to be able to apply the same logic to each one.
我有多个代码,需要能够对每个代码应用相同的逻辑。 Any suggestions on the best way to approach this?
关于解决此问题的最佳方法的任何建议?
Thanks谢谢
In your case, you want to sort the date, then groupby and extract first, last:在您的情况下,您想对日期进行排序,然后是 groupby 并首先提取,最后:
df.sort_values(['Code','Date']).groupby('Code')['Value'].agg(['first','last'])
Output: Output:
first last
Code
A 23432.0 80122.0
B 0.0 0.0
I would first sort_values
then you can drop_duplicates
on 'Code'.我会先
sort_values
然后你可以drop_duplicates
在“代码”上。 Using different logic for keep
this allows you to get the first and last row (based on Date) within each 'Code', which you can then subtract to get the day difference and Value difference for each code.使用不同的逻辑来
keep
这一点,您可以获取每个“代码”中的第一行和最后一行(基于日期),然后您可以将其减去以获得每个代码的日差和值差。
df = df.sort_values(['Code', 'Date'])
(df.drop_duplicates('Code', keep='last').set_index('Code')
- df.drop_duplicates('Code', keep='first').set_index('Code'))
# Date Value
#Code
#A 1096 days 56690.0
#B 0 days 0.0
Alternatively if you don't just need the difference and actually need the rows, then I would concat
those together instead of subtracting.或者,如果您不仅需要差异并且实际上需要行,那么我
concat
它们连接在一起而不是减去。 The main reason to avoid the .first
aggregation is because it does not guarantee data come from the same rows (without specifying dropna
) in the case of null values.避免
.first
聚合的主要原因是因为它不保证在 null 值的情况下数据来自相同的行(没有指定dropna
)。
pd.concat([df.drop_duplicates('Code', keep='last').set_index('Code'),
df.drop_duplicates('Code', keep='first').set_index('Code')],
keys=['Last', 'First'], axis=1)
# Last First
# Date Value Date Value
#Code
#A 2020-12-31 80122.0 2017-12-31 23432.0
#B 2020-12-31 0.0 2020-12-31 0.0
since you自从你
need to keep the max (2020-12-31) and min (2017-12-31) dates as well as the values...
需要保留最大(2020-12-31)和最小(2017-12-31)日期以及值...
, you can try: , 你可以试试:
df = pd.DataFrame({'Code':['A','A','A','A','B'],
'Date': ['2020-12-31', '2019-12-31', '2018-12-31', '2017-12-31', '2020-12-31'],
'Value': [80122.0, 45472.0, 31917.0, 23432.0, 0.0]
}, )
df.loc[:, 'Date'] = pd.to_datetime(df.loc[:, 'Date'])
is the df mentioned:是提到的df:
Code Date Value
0 A 2020-12-31 80122.0
1 A 2019-12-31 45472.0
2 A 2018-12-31 31917.0
3 A 2017-12-31 23432.0
4 B 2020-12-31 0.0
so another way can be:所以另一种方法可以是:
dictionary = {}
for code in df.loc[:, 'Code'].unique():
dictionary[code] = {'Date min': df.loc[df.loc[:, 'Code']==code,'Date'].min(),
'Value min': df.loc[(df.loc[:, 'Code']==code)& (df.loc[:,'Date'] == df.loc[df.loc[:, 'Code']==code,'Date'].min()), 'Value'].values[0],
'Date max': df.loc[df.loc[:, 'Code']==code,'Date'].max(),
'Value max':df.loc[(df.loc[:, 'Code']==code)&(df.loc[:,'Date'] == df.loc[df.loc[:, 'Code']==code,'Date'].max()), 'Value'].values[0]
}
resume = pd.DataFrame(dictionary)
resume = resume.transpose()
resume
that outputs:输出:
Date min Value min Date max Value max
A 2017-12-31 23432.0 2020-12-31 80122.0
B 2020-12-31 0.0 2020-12-31 0.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.