简体   繁体   English

为每个日期维护默认值 True/False

[英]Maintain a default of True/False for each date

During the day, new investment possibilities are registered, but the results ( lay column) are only registered at midnight each day.白天,新的投资可能性被登记,但结果( lay栏)仅在每天午夜登记。

So let's assume this CSV :所以让我们假设这个CSV

clock_now,competition,market_name,lay
2022/12/30,A,B,-1
2022/12/31,A,B,1.28
2023/01/01,A,B,-1
2023/01/02,A,B,1
2023/01/03,A,B,1
2023/01/04,A,B,
2023/01/04,A,B,
2023/01/04,A,B,

Until yesterday, 2023/01/03 , the sum of the lines that have the value A in competition and B in market_name , was +1.28直到昨天, 2023/01/03competition中值为Amarket_name中值为B的行的总和为+1.28

I only invest if it is above 0 , so during today, every time this combination of values comes, the answer will be True to invest.我只在它高于0时才投资,所以在今天,每次出现这种价值组合时,答案都是True投资。

At the end of the day, when the lay values are registered, I look at the total result:归根结底,当登记外行价值时,我会查看总结果:

clock_now,competition,market_name,lay
2022/12/30,A,B,-1
2022/12/31,A,B,1.28
2023/01/01,A,B,-1
2023/01/02,A,B,1
2023/01/03,A,B,1
2023/01/04,A,B,-1
2023/01/04,A,B,-1
2023/01/04,A,B,-1

End of the day: -1,72当天结束: -1,72

This means that tomorrow, if that same combination of values appears in the columns, I will not invest once because it will always be negative because it only calculates the values that it has until the previous day.这意味着明天,如果相同的值组合出现在列中,我将不会投资一次,因为它总是负数,因为它只计算前一天之前的值。

I'm trying to create a column to show where it was True and where it was False:我正在尝试创建一个列来显示它在哪里是真的,哪里是假的:

df = pd.read_csv('example.csv')
combinations = [['market_name', 'competition']]
for cbnt in combinations:
    df['invest'] = (df.groupby(cbnt)['lay']
                      .apply(lambda s: s.cumsum().shift())
                      .gt(df['lay'])
                   )

    df['cumulative'] = (df.groupby(cbnt)['lay']
                      .apply(lambda s: s.cumsum().shift())
                   )

    print(df[['clock_now','invest','cumulative']])

But the result is this:但结果是这样的:

    clock_now  invest  cumulative
0  2022/12/30   False         NaN
1  2022/12/31   False       -1.00
2  2023/01/01    True        0.28
3  2023/01/02   False       -0.72
4  2023/01/03   False        0.28
5  2023/01/04    True        1.28
6  2023/01/04    True        0.28
7  2023/01/04    True       -0.72

The expected result would be this:预期的结果是这样的:

    clock_now  invest  cumulative
0  2022/12/30   False         NaN
1  2022/12/31   False       -1.00
2  2023/01/01    True        0.28
3  2023/01/02   False       -0.72
4  2023/01/03    True        0.28
5  2023/01/04    True        1.28
6  2023/01/04    True        0.28
7  2023/01/04    True       -1.72

How should I proceed so that cumsum can understand that attention must be paid to maintaining a daily pattern according to the results of previous days?我应该如何进行才能让cumsum明白必须注意根据前几天的结果保持每天的模式?

Example Two:例子二:

clock_now,competition,market_name,lay
2022/08/09,A,B,-1.0
2022/08/12,A,B,1.28
2022/09/07,A,B,-1.0
2022/10/15,A,B,1.0
2022/10/15,A,B,-1.0
2022/11/20,A,B,1.0

Note that on 2022/10/15 , it is delivering one False and one True , so in fact it is not tracking according to the date which is how I want it to happen:请注意,在2022/10/15 ,它提供了一个False和一个True ,所以实际上它没有根据我希望它发生的日期进行跟踪:

    clock_now  invest  cumulative
0  2022/08/09   False         NaN
1  2022/08/12   False       -1.00
2  2022/09/07    True        0.28
3  2022/10/15   False       -0.72
4  2022/10/15    True        0.28
5  2022/11/20   False       -0.72

The correct would be always or all False or all True when on equal dates.在相同的日期,正确的总是或全为False或全为True Like this:像这样:

    clock_now  invest  cumulative
0  2022/08/09   False         NaN
1  2022/08/12   False       -1.00
2  2022/09/07    True        0.28
3  2022/10/15   False       -0.72
4  2022/10/15   False        0.28
5  2022/11/20   False       -0.72
(df.join(
    # Count market&competition specific cumsum for each row
    # and join back with df
    df.groupby(['market_name', 'competition']).lay.cumsum().rename('lay_cumsum') > 0
)
# Group by market&comp&date to get last cumsum within each day
.groupby(['market_name', 'competition', 'clock_now']) 
# Get cumsum Series for each group
.lay_cumsum
# Getting last cumsum within group
.last()
# Group by market&comp
.groupby(['market_name', 'competition'])
# Shift by one to assign to each date prev date's cumsum
.shift(1)
.rename('lay_cumsum')
.reset_index()
# Merge back with original df
.merge(df, on=['clock_now', 'market_name', 'competition']))

This will output这将 output

  market_name competition   clock_now lay_cumsum   lay
0           B           A  2022/12/30        NaN -1.00
1           B           A  2022/12/31      False  1.28
2           B           A  2023/01/01       True -1.00
3           B           A  2023/01/02      False  1.00
4           B           A  2023/01/03       True  1.00
5           B           A  2023/01/04       True -1.00
6           B           A  2023/01/04       True -1.00
7           B           A  2023/01/04       True -1.00

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM