Python-熊猫-Groupby-两个日期之间的值（非天）差

Question

ANSWER : 回答：

I found a way to answer my own question. 我找到了回答自己问题的方法。 Assuming I am looking for the location of one given day only (then extrapolate for my specific question): 假设我只在寻找给定一天的位置（然后针对我的特定问题推断）：

group_by = df.groupby(level='lvl_1')
ans = group_by.nth(df.index.get_level_values('lvl_2').unique().get_loc(day_2, method='nearest'))

Ideally, I would work with the location of each groupid, considering that the datetime vector could be different. 理想情况下，考虑到日期时间向量可能不同，我将使用每个groupid的位置。 However, I am having a hard time to figure out the last step...: 但是，我很难找出最后一步...：

group_by = df.groupby(level='lvl_1')
loc = group_by.apply(lambda x: x.index.get_level_values('lvl_2').unique().get_loc(day_2, method='nearest'))
ans = group_by.nth(loc.groupby(level='lvl_1'))

But it gives me an error for my last line: 但这给我最后一行的错误：

TypeError: n needs to be an int or a list/set/tuple of ints

If someone finds a way to solve this slight issue, fire up! 如果有人找到解决此微小问题的方法，请开火！ thxs thxs

---------------------------------------------------------------------------------------------------------------------------------------------------------------- -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- ----------

QUESTION 题

I have been looking around for an answer but most of the posts are related to difference in days, but not value difference between two dates. 我一直在寻找答案，但大多数帖子与天数差异有关，但与两个日期之间的价值差异无关。

Assuming the following code : 假设以下代码：

import pandas as pd
import numpy as np
import datetime
np.random.seed(15)
day = datetime.date.today()
day_1 = datetime.date.today() - datetime.timedelta(1)
day_2 = datetime.date.today() - datetime.timedelta(2)
day_3 = datetime.date.today() - datetime.timedelta(3)
ticker_date = [('fi', day), ('fi', day_1), ('fi', day_2), ('fi', day_3),
               ('di', day), ('di', day_1), ('di', day_2), ('di', day_3)]
index_df = pd.MultiIndex.from_tuples(ticker_date, names=['lvl_1', 'lvl_2'])
df = pd.DataFrame(np.random.rand(8), index_df, ['value'])

output: 输出：

                    value
lvl_1    lvl_2               
fi    2018-02-15  0.848818
      2018-02-14  0.178896
      2018-02-13  0.054363
      2018-02-12  0.361538
di    2018-02-15  0.275401
      2018-02-14  0.530000
      2018-02-13  0.305919
      2018-02-12  0.304474

I am looking for a method to groupby 'lvl_1' then get the difference between two given dates. 我正在寻找一种方法来分组“ lvl_1”，然后得到两个给定日期之间的差额。

For instance, the difference between February 14th and February 12th would be -0.1864 for 'fi' and 0.225526 for 'di'. 例如，2月14日与2月12日之间的差值，对于“ fi”而言为-0.1864，对于“ di”而言为0.225526。

I was working on the following lines of codes: 我正在处理以下代码行：

group_by = df.groupby(level='lvl_1')
nd = group_by.get_loc(day_3, method='nearest')
st = group_by.get_loc(day_1, method='nearest')
out = group_by.iloc[nd] - group_by.iloc[st]

But it looks like it is not a valid method... 但这似乎不是有效的方法...

AttributeError: 'DataFrameGroupBy' object has no attribute 'get_loc'

Anyone? 任何人？

Answer 1

This is a bit different from yours in spirit, but it should give what you want (although if your database is very big it might waste memory): 这在本质上与您的精神有些不同，但是它应该提供您想要的内容（尽管如果数据库很大，可能会浪费内存）：

expanded = df.reset_index().pivot_table(index='lvl_1',columns='lvl_2',values='value')
expanded[day_3] - expanded[day_1]

This returns a Series with the difference: 这将返回具有以下区别的系列：

lvl_1 lvl_1

di -0.225526 迪-0.225526

fi 0.182643 fi 0.182643

dtype: float64 dtype：float64

Answer 2

ANSWER : 回答：

I found a way to answer my own question. 我找到了回答自己问题的方法。 Assuming I am looking for the location of one given day only (then extrapolate for my specific question): 假设我只在寻找给定一天的位置（然后针对我的特定问题推断）：

group_by = df.groupby(level='lvl_1')
ans = group_by.nth(df.index.get_level_values('lvl_2').unique().get_loc(day_2, method='nearest'))

Ideally, I would work with the location of each groupid, considering that the datetime vector could be different. 理想情况下，考虑到日期时间向量可能不同，我将使用每个groupid的位置。 However, I am having a hard time to figure out the last step...: 但是，我很难找出最后一步...：

group_by = df.groupby(level='lvl_1')
loc = group_by.apply(lambda x: x.index.get_level_values('lvl_2').unique().get_loc(day_2, method='nearest'))
ans = group_by.nth(loc.groupby(level='lvl_1'))

But it gives me an error for my last line: 但这给我最后一行的错误：

TypeError: n needs to be an int or a list/set/tuple of ints

If someone finds a way to solve this slight issue, fire up! 如果有人找到解决此微小问题的方法，请开火！ In the meantime, my temporary answer does the job. 在此期间，我的临时答复已完成工作。 thxs thxs

Python-熊猫-Groupby-两个日期之间的值（非天）差

问题描述

2 个解决方案

解决方案1
1 2018-02-15 11:00:01

解决方案2
0 已采纳 2018-02-15 14:37:37

Python-熊猫-Groupby-两个日期之间的值（非天）差

问题描述

2 个解决方案

解决方案1 1 2018-02-15 11:00:01

解决方案2 0 已采纳 2018-02-15 14:37:37

解决方案1
1 2018-02-15 11:00:01

解决方案2
0 已采纳 2018-02-15 14:37:37