[英]Suboptimal for loop on large-ish dataset
So I have a DataFrame
with several thousand rows containing artificial forex trading data. 因此,我有一个包含数千行的DataFrame
,其中包含人工外汇交易数据。 The first ten rows look like this: 前十行如下所示:
I want to iterate over this set, and for each row, calculate the CommonCurrency
which in this case would be USD. 我要遍历此集合,并针对每一行,计算CommonCurrency
,在这种情况下为USD。 So for each row, I go over the CurrencyPair
, DeskRate
and OrderQty
columns and calculate a CommonCurrency
: 因此,对于每一行,我遍历CurrencyPair
, DeskRate
和OrderQty
列并计算CommonCurrency
:
for i in range(len(order_data)):
if (order_data['CurrencyPair'][i] == 'GBP/USD'):
order_data['CommonCurrency'][i] = order_data['DeskRate'][i] *
order_data['OrderQty'][i]
elif (order_data['CurrencyPair'][i] == 'AUD/USD'):
order_data['CommonCurrency'][i] = order_data['DeskRate'][i] *
order_data['OrderQty'][i]
elif (order_data['CurrencyPair'][i] == 'EUR/USD'):
order_data['CommonCurrency'][i] = order_data['DeskRate'][i] *
order_data['OrderQty'][i]
elif (order_data['CurrencyPair'][i] == 'USD/CHF'):
order_data['CommonCurrency'][i] = order_data['DeskRate'][i] /
order_data['OrderQty'][i]
elif (order_data['CurrencyPair'][i] == 'EUR/GBP'):
order_data['CommonCurrency'][i] = #different calculation
This does not seem like the right way of doing it, especially not if there's a large number of different currency pairs. 这似乎不是正确的做法,尤其是在存在大量不同货币对的情况下。 Another problem I come across is when I get to EUR/GBP
, because now I have to get both the DeskRate
from GBP/USD
and EUR/USD
, which I can't see how I can do with this method. 我遇到的另一个问题是,当我使用EUR/GBP
,因为现在我必须同时从DeskRate
GBP/USD
, GBP/USD
和EUR/USD
获得DeskRate
EUR/USD
,但我看不到如何使用此方法。
Any hints? 有什么提示吗?
One interesting feature in pandas is the concept of indexing . 大熊猫的一个有趣特征是索引的概念。 There are even more pythonic ways of doing this, but using loc
, you can assign values to a section of the dataframe using series (columns): 还有更多的Python方式,但是使用loc
,您可以使用系列(列)将值分配给数据框的一部分:
order_data.loc[order_data['CurrencyPair'].isin(('GBP/USD', 'AUD/USD', 'EUR/USD')), 'CurrencyPair'] = order_data['DeskRate'] * order_data['OrderQty']
order_data.loc[order_data['CurrencyPair'] == 'USD/CHF', 'CurrencyPair'] = order_data['DeskRate'] / order_data['OrderQty']
order_data.loc[order_data['CurrencyPair'] == 'EUR/GBP', 'CurrencyPair'] = some_func(order_data['DeskRate'], order_data['OrderQty'])
Thus avoiding any for
loops 从而避免任何for
循环
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.