简体   繁体   中英

How to count the number of consecutive days a value grows

I have the following df:

Date
2015-11-27    105.449997
2015-11-30    104.239998
2015-12-01    107.120003
2015-12-02    106.070000
2015-12-03    104.379997
                 ...    
2020-11-18    271.970001
2020-11-19    272.940002
2020-11-20    269.700012
2020-11-23    268.429993
2020-11-24    276.920013
Name: Close, Length: 1258, dtype: float64

What I am trying to do is find for how many days in a row has the Close closed lower than the previous day. Could you please advise how I can go about this?

For example: 在此处输入图像描述

At this point it would say Close was lower for the last 3 days on Date

    Date          Close     Days
    10/30/2020  263.109985  0.0
    11/2/2020   261.359985  1.0
    11/3/2020   265.299988  0.0
    11/4/2020   287.380005  1.0
    11/5/2020   294.679993  0.0
    11/6/2020   293.410004  1.0
    11/9/2020   278.769989  0.0
    11/10/2020  272.429993  1.0
    11/11/2020  276.480011  0.0
    11/12/2020  275.079987  1.0
    11/13/2020  276.950012  0.0
    11/16/2020  278.959991  0.0
    11/17/2020  275.000000  1.0
    11/18/2020  271.970001  2.0
    11/19/2020  272.940002  0.0
    11/20/2020  278.000000  1.0
    11/23/2020  277.000000  2.0
    11/24/2020  276.920013  3.0

How can achieve this?

Use the following code:

# Is the current "Close" increasing or same (compared with previous row)
df['incr'] = df.Close >= df.Close.shift(fill_value=0)
# Generate the result column
df['DaysDecr'] = df.groupby(df.incr.cumsum()).apply(
    lambda grp: (~grp.incr).cumsum()).reset_index(level=0, drop=True)
# Drop "incr" column
df.drop(columns='incr', inplace=True)

For your data sample the result is:

         Date       Close  Days  DaysDecr
0  2020-10-30  263.109985   0.0         0
1  2020-11-02  261.359985   1.0         1
2  2020-11-03  265.299988   0.0         0
3  2020-11-04  287.380005   1.0         0
4  2020-11-05  294.679993   0.0         0
5  2020-11-06  293.410004   1.0         1
6  2020-11-09  278.769989   0.0         2
7  2020-11-10  272.429993   1.0         3
8  2020-11-11  276.480011   0.0         0
9  2020-11-12  275.079987   1.0         1
10 2020-11-13  276.950012   0.0         0
11 2020-11-16  278.959991   0.0         0
12 2020-11-17  275.000000   1.0         1
13 2020-11-18  271.970001   2.0         2
14 2020-11-19  272.940002   0.0         0
15 2020-11-20  278.000000   1.0         0
16 2020-11-23  277.000000   2.0         1
17 2020-11-24  276.920013   3.0         2

To trace how this code works, generate the grouping:

df['incr'] = df.Close >= df.Close.shift(fill_value=0)
gr = df.groupby(df.incr.cumsum())

and print each group:

for key, grp in gr:
    print(f'\nGroup: {key}\n{grp}')

Note that incr column in each group starts with a single True (the date when Close either increased or stayed the same) and then there is a sequence of False (days when Close was lower that on the previous date).

So the result can be generated by negation of incr and cumsum() of it.

And the only thing to do is to drop incr column (not needed any more).

Here a simple quick and dirty solution for your problem. I put your sample data in a csv file and imported it using pandas . The rest should be described in the comments of the code:

    import pandas as pd
    #import your data from a csv that contains two columns with the headers "Date" and "Close"
    df = pd.read_csv("/your_path/sample_data.txt")
    
    #choose the column "Close" and save it into the variable "close"
    close = df["Close"]
    
    #go through dataframe and count the days where they increase
    counter = 0
    for i in range(1, len(close)):
        if close[i] > close[i-1]:
            counter = counter + 1
        else:
            counter = 0
    
    #print the amount of days that the closes increased
    print("The closes have increased over the last " + str(counter) + " days")

Here a solution by using the function apply to calculate this value for a given row.

import pandas as pd

columns = ["Date", "Close", "Days"]
data = [
    ["10/30/2020", 263.109985, 0.0],
    ["11/02/2020", 261.359985, 1.0],
    ["11/03/2020", 265.299988, 0.0],
    ["11/04/2020", 287.380005, 1.0],
    ["11/05/2020", 294.679993, 0.0],
    ["11/06/2020", 293.410004, 1.0],
    ["11/09/2020", 278.769989, 0.0],
    ["11/10/2020", 272.429993, 1.0],
    ["11/11/2020", 276.480011, 0.0],
    ["11/12/2020", 275.079987, 1.0],
    ["11/13/2020", 276.950012, 0.0],
    ["11/16/2020", 278.959991, 0.0],
    ["11/17/2020", 275.000000, 1.0],
    ["11/18/2020", 271.970001, 2.0],
    ["11/19/2020", 272.940002, 0.0],
    ["11/20/2020", 278.000000, 1.0],
    ["11/23/2020", 277.000000, 2.0],
    ["11/24/2020", 276.920013, 3.0]
]
# Load from data
df = pd.DataFrame(data, columns=columns)
# Convert to datetime
df["Date"] = pd.to_datetime(df["Date"], format="%m/%d/%Y")
# Order by date
df.sort_values(["Date"], ascending=True, inplace=True)

def find_last(df, my_date, my_close, count):
    previous_index = df[df["Date"]<my_date]["Date"].max()
    previous_close = df[df["Date"]==previous_index]["Close"].max()
    if df[df["Date"] < my_date]["Date"].count()>0 and my_close < previous_close:
        return find_last(df[df["Date"]<my_date], previous_index, previous_close, count + 1)
    else:
        return count

df["days_calc"] = df.apply(
    lambda row: find_last(df.copy(), row["Date"], row["Close"], 0), axis=1)

Output:

    Date        Close       Days    days_calc
0   2020-10-30  263.109985  0.0     0
1   2020-11-02  261.359985  1.0     1
2   2020-11-03  265.299988  0.0     0
3   2020-11-04  287.380005  1.0     0
4   2020-11-05  294.679993  0.0     0
5   2020-11-06  293.410004  1.0     1
6   2020-11-09  278.769989  0.0     2
7   2020-11-10  272.429993  1.0     3
8   2020-11-11  276.480011  0.0     0
9   2020-11-12  275.079987  1.0     1
10  2020-11-13  276.950012  0.0     0
11  2020-11-16  278.959991  0.0     0
12  2020-11-17  275.000000  1.0     1
13  2020-11-18  271.970001  2.0     2
14  2020-11-19  272.940002  0.0     0
15  2020-11-20  278.000000  1.0     0
16  2020-11-23  277.000000  2.0     1
17  2020-11-24  276.920013  3.0     2

I just made this and it should work for you.

import pandas as pd

columns = ["Date", "Close"]
data = [
    ["10/30/2020", 263.109985],
    ["11/02/2020", 261.359985],
    ["11/03/2020", 265.299988],
    ["11/04/2020", 287.380005],
    ["11/05/2020", 294.679993],
    ["11/06/2020", 293.410004],
    ["11/09/2020", 278.769989],
    ["11/10/2020", 272.429993],
    ["11/11/2020", 276.480011],
    ["11/12/2020", 275.079987],
    ["11/13/2020", 276.950012],
    ["11/16/2020", 278.959991],
    ["11/17/2020", 275.000000],
    ["11/18/2020", 271.970001],
    ["11/19/2020", 272.940002],
    ["11/20/2020", 278.000000],
    ["11/23/2020", 277.000000],
    ["11/24/2020", 276.920013],
]

df = pd.DataFrame(data, columns=columns)

NewList = []
counter = 0

for i in range(len(df["Close"])):
    print(i)
    if i == 0:
        NewList.append(0)
        continue
    previous = int(i)-1
    if df["Close"][i] > df["Close"][previous]:
             counter += 1
             NewList.append(counter)
    else:
             counter = 0
             NewList.append(counter)

df["Days"] = NewList

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM