简体   繁体   中英

Using Python loop for next i rows in dataframe

I'm a new Python user (making the shift from VBA) and am having trouble figuring out Python's loop function. I have a dataframe df, and I want to create a column of variables based on some condition being met in another column, based on a loop. Something like the below:

cycle = 5
dummy = 1

for i = 1 to cycle
    if df["high"].iloc[i] >= df["exit"].iloc[i] and
    df["low"].iloc[i] <= df["exit"].iloc[i] then
        df["signal"] = dummy
        break
    elif i = cycle
        df["signal"] = cycle + 1
        break
    else:
        dummy = dummy + 1
        next i

Basically trying to find in which column over the next columns up to the cycle variable are the conditions in the if statement met, and if they're never met, assign cycle + 1. So df["signal"] will be a column of numbers ranging 1 -> (cycle + 1). Also, there are some NaN values in df["exit"] , not sure how that affects the loop.

I've found fairly extensive documentation on row iterations on the site, I feel like this is close to where I need to get to , but can't figure out how to adapt it. Thanks for any advice!

EDIT: INCLUDED DATA SAMPLE FROM EXCEL CELLS BELOW:

high low EXIT test   signal/(OUTPUT COLUMN)
4     3    4    1      1
2     2    2    1      1
2     3    5    0      6
4     3    1    0      5
2     5    2    0      4
5     5    1    0      3
3     1    5    0      2
5     1    5    1      1
1     1    4    0      0

EDIT 2: FURTHER CLARIFICATION AROUND SCRIPT Once the condition

df["high"].iloc[i] >= df["exit"].iloc[i] and
    df["low"].iloc[i] <= df["exit"].iloc[i]

is met in the loop, it should terminate for that particular instance/row.

EDIT 3: EXPECTED OUTPUT

The expected output is the df["signal"] column - it is the first instance in the loop where the condition

 df["high"].iloc[i] >= df["exit"].iloc[i] and
    df["low"].iloc[i] <= df["exit"].iloc[i]

is met in any given row. The output in df["signal"] is effectively i from the loop, or the given iteration.

here is how I would solve the problem, the column 'gr' must not exist before doing this:

# first check all the rows meeting the conditions and add 1 in a temporary column gr
df.loc[(df["high"] >= df["exit"]) & (df["low"] <= df["exit"]), 'gr'] = 1
# manipulate column gr to use groupby after
df['gr'] = df['gr'].cumsum().bfill()
# use cumcount after groupby to recalculate signal
df.loc[:,'signal'] = df.groupby('gr').cumcount(ascending=False).add(1)
# cut the value in signal to the value cycle + 1
df.loc[df['signal'] > cycle, 'signal'] = cycle + 1
# drop column gr
df = df.drop('gr',1)

and you get

   high  low  exit  signal
0     4    3     4       1
1     2    2     2       1
2     2    3     5       6
3     4    3     1       5
4     2    5     2       4
5     5    5     1       3
6     3    1     5       2
7     5    1     5       1
8     1    1     4       1

Note: The last row is not working properly as never a row with the condition is met after, and not sure how it will be in the full data or how to handle this. You may consider to add df = df.dropna(subset=['gr']) after the line starting with df['gr'] = ... to drop these last rows, up to you.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM