[英]how to iterate over Pandas data frame and update based on previous rows
I have some code which I got to work but it's rather slow. 我有一些需要工作的代码,但是速度很慢。 I need to update a table of trades and quotes. 我需要更新交易表和报价表。 The base table is like this: 基表是这样的:
+--------+-----------+----------+----------+--------+----------+
| Symbol | Timestamp | BidPrice | AskPrice | Price | Quantity |
+--------+-----------+----------+----------+--------+----------+
| MSFT | 9:00 | | | 46.98 | 140 |
| MSFT | 9:01 | | | 46.99 | 100 |
| MSFT | 9:02 | | | 47 | 400 |
| MSFT | 9:03 | | | 47 | 100 |
| MSFT | 9:04 | 46.87 | 46.99 | | |
| MSFT | 9:05 | | | 46.89 | 100 |
| MSFT | 9:06 | | | 46.95 | 600 |
| MSFT | 9:07 | 46.91 | 46.99 | | |
| MSFT | 9:08 | 46.91 | 46.97 | | |
| MSFT | 9:09 | | | 46.935 | 100 |
| MSFT | 9:10 | 46.89 | 46.96 | | |
| MSFT | 9:11 | | | 46.93 | 100 |
| MSFT | 9:12 | | | 46.91 | 100 |
+--------+-----------+----------+----------+--------+----------+
I need to set the bid and price for each trade (there is a Price but no bid/ask). 我需要为每个交易设置出价和价格(有价格,但没有出价/要价)。 So starting with bid = 46.8 and ask = 47, set the values, and when those values change, set new values. 因此,以bid = 46.8并要求= 47开始,设置值,然后在这些值更改时设置新值。 Like this: 像这样:
+--------+-----------+----------+----------+--------+----------+
| Symbol | Timestamp | BidPrice | AskPrice | Price | Quantity |
+--------+-----------+----------+----------+--------+----------+
| MSFT | 9:00 | 46.8 | 47 | 46.98 | 140 |
| MSFT | 9:01 | 46.8 | 47 | 46.99 | 100 |
| MSFT | 9:02 | 46.8 | 47 | 47 | 400 |
| MSFT | 9:03 | 46.8 | 47 | 47 | 100 |
| MSFT | 9:04 | 46.87 | 46.99 | | |
| MSFT | 9:05 | 46.87 | 46.99 | 46.89 | 100 |
| MSFT | 9:06 | 46.87 | 46.99 | 46.95 | 600 |
| MSFT | 9:07 | 46.91 | 46.99 | | |
| MSFT | 9:08 | 46.91 | 46.97 | | |
| MSFT | 9:09 | 46.91 | 46.97 | 46.935 | 100 |
| MSFT | 9:10 | 46.89 | 46.96 | | |
| MSFT | 9:11 | 46.89 | 46.96 | 46.93 | 100 |
| MSFT | 9:12 | 46.89 | 46.96 | 46.91 | 100 |
+--------+-----------+----------+----------+--------+----------+
I worked this out iterating over rows, but for 112k rows, it takes 35 seconds. 我反复遍历了行,但是对于112k行,这需要35秒。
for i, row in qts_trd.iterrows():
if np.isnan(row['Price']):
bid = row['BidPrice']
ask = row['AskPrice']
if np.isnan(row['BidPrice']):
qts_trd.at[i,'BidPrice'] = bid
qts_trd.at[i,'AskPrice'] = ask
I know the basics of lambda functions, applying the same one to every row. 我知道lambda函数的基础,将相同的函数应用于每一行。 I think it's quicker, but as you see it changes. 我认为它更快,但是正如您所见,它会发生变化。 Is there any more efficient/quicker way to do it? 有没有更有效/快捷的方法来做到这一点?
This is Python 3.7 in Spyder. 这是Spyder中的Python 3.7。
Try pandas fillna() function using the method='ffill'
使用method='ffill'
尝试pandas fillna()函数
So: 所以:
qts_trd.BidPrice.fillna(method='ffill', inplace=True)
qts_trd.AskPrice.fillna(method='ffill', inplace=True)
In my experience it's very quick 以我的经验,它很快
Edit: 编辑:
I just realised this wont fill your first values, the below code will insert a row at the top to fill from, and then delete it. 我只是意识到这不会填充您的第一个值,下面的代码将在顶部插入一行以进行填充,然后将其删除。
qts_trd.loc[-1] = ['', '', 46.8, 47, '', '']
qts_trd.index += 1
qts_trd.sort_index(inplace=True)
qts_trd.BidPrice.fillna(method='ffill', inplace=True)
qts_trd.AskPrice.fillna(method='ffill', inplace=True)
qts_trd.drop(0,0,inplace=True)
qts_trd.reset_index(drop=True, inplace=True)
Edit 2.0...thanks to @no_body 's comment: 编辑2.0 ...感谢@no_body的评论:
qts_trd.BidPrice.fillna(method='ffill', inplace=True).fillna(46.8)
qts_trd.AskPrice.fillna(method='ffill', inplace=True).fillna(47)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.