简体   繁体   English

如何遍历前几行以比较 Pandas DataFrame 中的值

[英]How to iterate over previous rows to compare values in a Pandas DataFrame

I have a Dataframe from pandas like this:我有一个来自 pandas 的 Dataframe ,如下所示:

import pandas as pd
raw_data = [{'Date': '1-10-19', 'Price':7, 'Check': 0}, 
            {'Date': '2-10-19','Price':8.5, 'Check': 0}, 
            {'Date': '3-10-19','Price':9, 'Check': 1}, 
            {'Date': '4-10-19','Price':50, 'Check': 1}, 
            {'Date': '5-10-19','Price':80, 'Check': 1}, 
            {'Date': '6-10-19','Price':100, 'Check': 1}]
df = pd.DataFrame(raw_data)
df.set_index('Date')

This is what it looks like:这是它的样子:

           Price  Check
Date        
1-10-19     7.0      0
2-10-19     8.5      0 
3-10-19     9.0      1
4-10-19     50.0     1 
5-10-19     80.0     1
6-10-19     100.0    1

Now what I'm trying to do is that for each row where 'Check" is 1, I want to check the number of rows prior to that row in which the price was less than 10% of that row's price. For example, for the 6th row where the price is 100, I want to iterate over the the previous rows and count the rows until the price is less than 10 (10% of 100), which in this case would 3 rows prior where the price is 9. Then want to save the results in a new column.现在我要做的是,对于“检查”为 1 的每一行,我想检查价格低于该行价格 10% 的行之前的行数。例如,对于价格为 100 的第 6 行,我想遍历前面的行并计算行,直到价格小于 10(100 的 10%),在这种情况下,价格为 9 的前 3 行。然后想将结果保存在新列中。

The final result would look like this:最终结果将如下所示:

           Price  Check  Rows_till_small
Date        
1-10-19     7.0      0    NaN
2-10-19     8.5      0    NaN
3-10-19     9.0      1    Nan
4-10-19     50.0     1    NaN
5-10-19     80.0     1    4
6-10-19     100.0    1    3

I've thought a lot about how I could do this using some kind of Rolling function, but I don't think it's possible.我想了很多关于如何使用某种滚动 function 来做到这一点,但我认为这是不可能的。 I've also thought about iterating through the entire DataFrame using iterrows or itertuples, but I can't imagine of a way to do it without being extremely inefficient.我还考虑过使用 iterrows 或 itertuples 遍历整个 DataFrame,但我无法想象一种方法可以做到效率极低。

You can solve the issue the following way:您可以通过以下方式解决问题:

import pandas as pd
raw_data = [{'Date': '1-10-19', 'Price': 7, 'Check': 0},
            {'Date': '2-10-19', 'Price': 8.5, 'Check': 0},
            {'Date': '3-10-19', 'Price': 9, 'Check': 1},
            {'Date': '4-10-19', 'Price': 50, 'Check': 1},
            {'Date': '5-10-19', 'Price': 80, 'Check': 1},
            {'Date': '6-10-19', 'Price': 100, 'Check': 1}]
df = pd.DataFrame(raw_data)

new_column = [None] * len(df["Price"])  # create new column

for i in range(len(df["Price"])):
    if df['Check'][i] == 1:
        percent_10 = df['Price'][i] * 0.1
        for j in range(i, -1, -1):
            print(j)
            if df['Price'][j] < percent_10:
                new_column[i] = i - j
                break


df["New"] = new_column  # add new column

print(df)

Hope the answer is useful for you, feel free to ask questions.希望回答对你有用,欢迎提问。

Check this out看一下这个

diff = df['Price'].apply(lambda x:x > (df['Price']*.1))
RTS=[]
for i in range(len(df)):
    check = (diff)[i]
    ind = check.idxmax()
    if ind != 0:
        val = (i-ind)+1        
    else:
        val = np.nan
    RTS.append(val)
df['Rows_till_small'] = RTS
print(df)

Output Output

       Date     Price   Check   Rows_till_small
0   1-10-19     7.0     0       NaN
1   2-10-19     8.5     0       NaN
2   3-10-19     9.0     1       NaN
3   4-10-19     50.0    1       NaN
4   5-10-19     80.0    1       4.0
5   6-10-19     100.0   1       3.0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM