繁体   English   中英

Pandas groupby 两个最近日期之间的差异

[英]Pandas groupby difference between 2 most recent dates

熊猫版本:1.1.0

您好,我正在尝试找出 data_collected 的最后 2 个日期之间的值差异。 虽然代码在处理连续日期时工作正常,但由于没有收集数据,因此有 2 天的时间间隔,因此我一直在思考如何处理周末。 这会导致 groupby.diff() 忽略它们。

使用以下代码,我能够切出 2 个特定日期并使用 .reindex_like() 获得结果

# (working example when specifically slicing on 2 dates)
prior_date = df.loc[df['date_collected'] == '2020-11-06']
current = df.loc[df['date_collected'] == '2020-11-09']

prior_date = prior_date.set_index('date')['value']
current = current.set_index('date')['value']
prior_date = prior_date.reindex_like(current).fillna(0)

df = (current - prior_date).reset_index()
change = df[df['value'] != 0].dropna(axis=0)

但是,当我尝试为整个数据帧模拟这些结果时,我无法找到在多索引上使用 reindex_like 的方法。 尝试使用 .last() 但随后意识到缺少周末成为一个问题。

# (current result down below)
chng = df.set_index(['date_collected', 'date'])['value']
chng = chng.groupby(level=1).diff().reset_index()

last = df.groupby('date_collected')[['date', 'value']].last().reset_index()
chng = chng.set_index(['date_collected', 'value'])
last = last.set_index(['date_collected', 'value'])

chng = chng.fillna(last)
chng = chng[chng['value'] != 0].dropna()
# input data
+----------------+------------+-------+
| date_collected |    date    | value |
+----------------+------------+-------+
| 2020-11-06     | 2020-11-01 |     4 |
| 2020-11-06     | 2020-11-02 |     5 |
| 2020-11-06     | 2020-11-03 |     1 |
| 2020-11-06     | 2020-11-04 |     3 |
| 2020-11-06     | 2020-11-05 |     1 |
| 2020-11-09     | 2020-11-04 |     3 |
| 2020-11-09     | 2020-11-05 |     3 |
| 2020-11-09     | 2020-11-06 |     5 |
| 2020-11-09     | 2020-11-07 |     1 |
| 2020-11-09     | 2020-11-08 |     1 |
| 2020-11-10     | 2020-11-05 |     3 |
| 2020-11-10     | 2020-11-06 |     5 |
| 2020-11-10     | 2020-11-07 |     1 |
| 2020-11-10     | 2020-11-08 |     3 |
| 2020-11-10     | 2020-11-09 |     2 |
+----------------+------------+-------+

# wanted results
+----------------+------------+-------+
| date_collected |    date    | value |
+----------------+------------+-------+
| 2020-11-06     | 2020-11-05 |     1 |
| 2020-11-09     | 2020-11-05 |     2 |
| 2020-11-09     | 2020-11-06 |     5 |
| 2020-11-09     | 2020-11-07 |     1 |
| 2020-11-09     | 2020-11-08 |     1 |
| 2020-11-10     | 2020-11-08 |     2 |
| 2020-11-10     | 2020-11-09 |     2 |
+----------------+------------+-------+

# current results
+----------------+------------+-------+
| date_collected |    date    | value |
+----------------+------------+-------+
| 2020-11-06     | 2020-11-05 |     1 |
| 2020-11-09     | 2020-11-05 |     2 |
| 2020-11-09     | 2020-11-08 |     1 |
| 2020-11-10     | 2020-11-08 |     2 |
| 2020-11-10     | 2020-11-09 |     2 |
+----------------+------------+-------+

能够找出重新索引多索引并使用以下代码获得想要的结果:

dates = pd.date_range(df['date'].min(), df['date'].max())
new_idx = pd.MultiIndex.from_product([df['date_collected'].unique(), dates])

df = df.set_index(['date_collected', 'date'])
df = df.reindex(new_idx).fillna(0)

chng = df.groupby(level=1).diff()
chng = chng[chng['value'] != 0].dropna()

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM