为每个日期维护默认值 True/False

Question

During the day, new investment possibilities are registered, but the results ( lay column) are only registered at midnight each day.白天，新的投资可能性被登记，但结果（ lay栏）仅在每天午夜登记。

So let's assume this CSV :所以让我们假设这个CSV ：

clock_now,competition,market_name,lay
2022/12/30,A,B,-1
2022/12/31,A,B,1.28
2023/01/01,A,B,-1
2023/01/02,A,B,1
2023/01/03,A,B,1
2023/01/04,A,B,
2023/01/04,A,B,
2023/01/04,A,B,

Until yesterday, 2023/01/03 , the sum of the lines that have the value A in competition and B in market_name , was +1.28直到昨天， 2023/01/03 ， competition中值为A和market_name中值为B的行的总和为+1.28

I only invest if it is above 0 , so during today, every time this combination of values comes, the answer will be True to invest.我只在它高于0时才投资，所以在今天，每次出现这种价值组合时，答案都是True投资。

At the end of the day, when the lay values are registered, I look at the total result:归根结底，当登记外行价值时，我会查看总结果：

clock_now,competition,market_name,lay
2022/12/30,A,B,-1
2022/12/31,A,B,1.28
2023/01/01,A,B,-1
2023/01/02,A,B,1
2023/01/03,A,B,1
2023/01/04,A,B,-1
2023/01/04,A,B,-1
2023/01/04,A,B,-1

End of the day: -1,72当天结束： -1,72

This means that tomorrow, if that same combination of values appears in the columns, I will not invest once because it will always be negative because it only calculates the values that it has until the previous day.这意味着明天，如果相同的值组合出现在列中，我将不会投资一次，因为它总是负数，因为它只计算前一天之前的值。

I'm trying to create a column to show where it was True and where it was False:我正在尝试创建一个列来显示它在哪里是真的，哪里是假的：

df = pd.read_csv('example.csv')
combinations = [['market_name', 'competition']]
for cbnt in combinations:
    df['invest'] = (df.groupby(cbnt)['lay']
                      .apply(lambda s: s.cumsum().shift())
                      .gt(df['lay'])
                   )

    df['cumulative'] = (df.groupby(cbnt)['lay']
                      .apply(lambda s: s.cumsum().shift())
                   )

    print(df[['clock_now','invest','cumulative']])

But the result is this:但结果是这样的：

    clock_now  invest  cumulative
0  2022/12/30   False         NaN
1  2022/12/31   False       -1.00
2  2023/01/01    True        0.28
3  2023/01/02   False       -0.72
4  2023/01/03   False        0.28
5  2023/01/04    True        1.28
6  2023/01/04    True        0.28
7  2023/01/04    True       -0.72

The expected result would be this:预期的结果是这样的：

    clock_now  invest  cumulative
0  2022/12/30   False         NaN
1  2022/12/31   False       -1.00
2  2023/01/01    True        0.28
3  2023/01/02   False       -0.72
4  2023/01/03    True        0.28
5  2023/01/04    True        1.28
6  2023/01/04    True        0.28
7  2023/01/04    True       -1.72

How should I proceed so that cumsum can understand that attention must be paid to maintaining a daily pattern according to the results of previous days?我应该如何进行才能让cumsum明白必须注意根据前几天的结果保持每天的模式？

Example Two:例子二：

clock_now,competition,market_name,lay
2022/08/09,A,B,-1.0
2022/08/12,A,B,1.28
2022/09/07,A,B,-1.0
2022/10/15,A,B,1.0
2022/10/15,A,B,-1.0
2022/11/20,A,B,1.0

Note that on 2022/10/15 , it is delivering one False and one True , so in fact it is not tracking according to the date which is how I want it to happen:请注意，在2022/10/15 ，它提供了一个False和一个True ，所以实际上它没有根据我希望它发生的日期进行跟踪：

    clock_now  invest  cumulative
0  2022/08/09   False         NaN
1  2022/08/12   False       -1.00
2  2022/09/07    True        0.28
3  2022/10/15   False       -0.72
4  2022/10/15    True        0.28
5  2022/11/20   False       -0.72

The correct would be always or all False or all True when on equal dates.在相同的日期，正确的总是或全为False或全为True 。 Like this:像这样：

    clock_now  invest  cumulative
0  2022/08/09   False         NaN
1  2022/08/12   False       -1.00
2  2022/09/07    True        0.28
3  2022/10/15   False       -0.72
4  2022/10/15   False        0.28
5  2022/11/20   False       -0.72

Answer 1

(df.join(
    # Count market&competition specific cumsum for each row
    # and join back with df
    df.groupby(['market_name', 'competition']).lay.cumsum().rename('lay_cumsum') > 0
)
# Group by market&comp&date to get last cumsum within each day
.groupby(['market_name', 'competition', 'clock_now']) 
# Get cumsum Series for each group
.lay_cumsum
# Getting last cumsum within group
.last()
# Group by market&comp
.groupby(['market_name', 'competition'])
# Shift by one to assign to each date prev date's cumsum
.shift(1)
.rename('lay_cumsum')
.reset_index()
# Merge back with original df
.merge(df, on=['clock_now', 'market_name', 'competition']))

This will output这将 output

  market_name competition   clock_now lay_cumsum   lay
0           B           A  2022/12/30        NaN -1.00
1           B           A  2022/12/31      False  1.28
2           B           A  2023/01/01       True -1.00
3           B           A  2023/01/02      False  1.00
4           B           A  2023/01/03       True  1.00
5           B           A  2023/01/04       True -1.00
6           B           A  2023/01/04       True -1.00
7           B           A  2023/01/04       True -1.00

为每个日期维护默认值 True/False

问题描述

1 个解决方案

解决方案1
1 已采纳 2023-01-08 03:10:33

为每个日期维护默认值 True/False

问题描述

1 个解决方案

解决方案1 1 已采纳 2023-01-08 03:10:33

解决方案1
1 已采纳 2023-01-08 03:10:33