[英]How to iterate over previous rows to compare values in a Pandas DataFrame
I have a Dataframe from pandas like this:我有一个来自 pandas 的 Dataframe ,如下所示:
import pandas as pd
raw_data = [{'Date': '1-10-19', 'Price':7, 'Check': 0},
{'Date': '2-10-19','Price':8.5, 'Check': 0},
{'Date': '3-10-19','Price':9, 'Check': 1},
{'Date': '4-10-19','Price':50, 'Check': 1},
{'Date': '5-10-19','Price':80, 'Check': 1},
{'Date': '6-10-19','Price':100, 'Check': 1}]
df = pd.DataFrame(raw_data)
df.set_index('Date')
This is what it looks like:这是它的样子:
Price Check
Date
1-10-19 7.0 0
2-10-19 8.5 0
3-10-19 9.0 1
4-10-19 50.0 1
5-10-19 80.0 1
6-10-19 100.0 1
Now what I'm trying to do is that for each row where 'Check" is 1, I want to check the number of rows prior to that row in which the price was less than 10% of that row's price. For example, for the 6th row where the price is 100, I want to iterate over the the previous rows and count the rows until the price is less than 10 (10% of 100), which in this case would 3 rows prior where the price is 9. Then want to save the results in a new column.现在我要做的是,对于“检查”为 1 的每一行,我想检查价格低于该行价格 10% 的行之前的行数。例如,对于价格为 100 的第 6 行,我想遍历前面的行并计算行,直到价格小于 10(100 的 10%),在这种情况下,价格为 9 的前 3 行。然后想将结果保存在新列中。
The final result would look like this:最终结果将如下所示:
Price Check Rows_till_small
Date
1-10-19 7.0 0 NaN
2-10-19 8.5 0 NaN
3-10-19 9.0 1 Nan
4-10-19 50.0 1 NaN
5-10-19 80.0 1 4
6-10-19 100.0 1 3
I've thought a lot about how I could do this using some kind of Rolling function, but I don't think it's possible.我想了很多关于如何使用某种滚动 function 来做到这一点,但我认为这是不可能的。 I've also thought about iterating through the entire DataFrame using iterrows or itertuples, but I can't imagine of a way to do it without being extremely inefficient.
我还考虑过使用 iterrows 或 itertuples 遍历整个 DataFrame,但我无法想象一种方法可以做到效率极低。
You can solve the issue the following way:您可以通过以下方式解决问题:
import pandas as pd
raw_data = [{'Date': '1-10-19', 'Price': 7, 'Check': 0},
{'Date': '2-10-19', 'Price': 8.5, 'Check': 0},
{'Date': '3-10-19', 'Price': 9, 'Check': 1},
{'Date': '4-10-19', 'Price': 50, 'Check': 1},
{'Date': '5-10-19', 'Price': 80, 'Check': 1},
{'Date': '6-10-19', 'Price': 100, 'Check': 1}]
df = pd.DataFrame(raw_data)
new_column = [None] * len(df["Price"]) # create new column
for i in range(len(df["Price"])):
if df['Check'][i] == 1:
percent_10 = df['Price'][i] * 0.1
for j in range(i, -1, -1):
print(j)
if df['Price'][j] < percent_10:
new_column[i] = i - j
break
df["New"] = new_column # add new column
print(df)
Hope the answer is useful for you, feel free to ask questions.希望回答对你有用,欢迎提问。
Check this out看一下这个
diff = df['Price'].apply(lambda x:x > (df['Price']*.1))
RTS=[]
for i in range(len(df)):
check = (diff)[i]
ind = check.idxmax()
if ind != 0:
val = (i-ind)+1
else:
val = np.nan
RTS.append(val)
df['Rows_till_small'] = RTS
print(df)
Output Output
Date Price Check Rows_till_small
0 1-10-19 7.0 0 NaN
1 2-10-19 8.5 0 NaN
2 3-10-19 9.0 1 NaN
3 4-10-19 50.0 1 NaN
4 5-10-19 80.0 1 4.0
5 6-10-19 100.0 1 3.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.