简体   繁体   中英

How to iterate over previous rows to compare values in a Pandas DataFrame

I have a Dataframe from pandas like this:

import pandas as pd
raw_data = [{'Date': '1-10-19', 'Price':7, 'Check': 0}, 
            {'Date': '2-10-19','Price':8.5, 'Check': 0}, 
            {'Date': '3-10-19','Price':9, 'Check': 1}, 
            {'Date': '4-10-19','Price':50, 'Check': 1}, 
            {'Date': '5-10-19','Price':80, 'Check': 1}, 
            {'Date': '6-10-19','Price':100, 'Check': 1}]
df = pd.DataFrame(raw_data)
df.set_index('Date')

This is what it looks like:

           Price  Check
Date        
1-10-19     7.0      0
2-10-19     8.5      0 
3-10-19     9.0      1
4-10-19     50.0     1 
5-10-19     80.0     1
6-10-19     100.0    1

Now what I'm trying to do is that for each row where 'Check" is 1, I want to check the number of rows prior to that row in which the price was less than 10% of that row's price. For example, for the 6th row where the price is 100, I want to iterate over the the previous rows and count the rows until the price is less than 10 (10% of 100), which in this case would 3 rows prior where the price is 9. Then want to save the results in a new column.

The final result would look like this:

           Price  Check  Rows_till_small
Date        
1-10-19     7.0      0    NaN
2-10-19     8.5      0    NaN
3-10-19     9.0      1    Nan
4-10-19     50.0     1    NaN
5-10-19     80.0     1    4
6-10-19     100.0    1    3

I've thought a lot about how I could do this using some kind of Rolling function, but I don't think it's possible. I've also thought about iterating through the entire DataFrame using iterrows or itertuples, but I can't imagine of a way to do it without being extremely inefficient.

You can solve the issue the following way:

import pandas as pd
raw_data = [{'Date': '1-10-19', 'Price': 7, 'Check': 0},
            {'Date': '2-10-19', 'Price': 8.5, 'Check': 0},
            {'Date': '3-10-19', 'Price': 9, 'Check': 1},
            {'Date': '4-10-19', 'Price': 50, 'Check': 1},
            {'Date': '5-10-19', 'Price': 80, 'Check': 1},
            {'Date': '6-10-19', 'Price': 100, 'Check': 1}]
df = pd.DataFrame(raw_data)

new_column = [None] * len(df["Price"])  # create new column

for i in range(len(df["Price"])):
    if df['Check'][i] == 1:
        percent_10 = df['Price'][i] * 0.1
        for j in range(i, -1, -1):
            print(j)
            if df['Price'][j] < percent_10:
                new_column[i] = i - j
                break


df["New"] = new_column  # add new column

print(df)

Hope the answer is useful for you, feel free to ask questions.

Check this out

diff = df['Price'].apply(lambda x:x > (df['Price']*.1))
RTS=[]
for i in range(len(df)):
    check = (diff)[i]
    ind = check.idxmax()
    if ind != 0:
        val = (i-ind)+1        
    else:
        val = np.nan
    RTS.append(val)
df['Rows_till_small'] = RTS
print(df)

Output

       Date     Price   Check   Rows_till_small
0   1-10-19     7.0     0       NaN
1   2-10-19     8.5     0       NaN
2   3-10-19     9.0     1       NaN
3   4-10-19     50.0    1       NaN
4   5-10-19     80.0    1       4.0
5   6-10-19     100.0   1       3.0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM