简体   繁体   English

大型数据集上的循环次优

[英]Suboptimal for loop on large-ish dataset

So I have a DataFrame with several thousand rows containing artificial forex trading data. 因此,我有一个包含数千行的DataFrame ,其中包含人工外汇交易数据。 The first ten rows look like this: 前十行如下所示:

在此处输入图片说明

I want to iterate over this set, and for each row, calculate the CommonCurrency which in this case would be USD. 我要遍历此集合,并针对每一行,计算CommonCurrency ,在这种情况下为USD。 So for each row, I go over the CurrencyPair , DeskRate and OrderQty columns and calculate a CommonCurrency : 因此,对于每一行,我遍历CurrencyPairDeskRateOrderQty列并计算CommonCurrency

for i in range(len(order_data)):
    if (order_data['CurrencyPair'][i] == 'GBP/USD'):
        order_data['CommonCurrency'][i] = order_data['DeskRate'][i] * 
        order_data['OrderQty'][i] 
    elif (order_data['CurrencyPair'][i] == 'AUD/USD'):
        order_data['CommonCurrency'][i] = order_data['DeskRate'][i] * 
        order_data['OrderQty'][i]
    elif (order_data['CurrencyPair'][i] == 'EUR/USD'):
        order_data['CommonCurrency'][i] = order_data['DeskRate'][i] * 
        order_data['OrderQty'][i]
    elif (order_data['CurrencyPair'][i] == 'USD/CHF'):
        order_data['CommonCurrency'][i] = order_data['DeskRate'][i] / 
        order_data['OrderQty'][i]
    elif (order_data['CurrencyPair'][i] == 'EUR/GBP'):
        order_data['CommonCurrency'][i] = #different calculation

This does not seem like the right way of doing it, especially not if there's a large number of different currency pairs. 这似乎不是正确的做法,尤其是在存在大量不同货币对的情况下。 Another problem I come across is when I get to EUR/GBP , because now I have to get both the DeskRate from GBP/USD and EUR/USD , which I can't see how I can do with this method. 我遇到的另一个问题是,当我使用EUR/GBP ,因为现在我必须同时从DeskRate GBP/USDGBP/USDEUR/USD获得DeskRate EUR/USD ,但我看不到如何使用此方法。

Any hints? 有什么提示吗?

One interesting feature in pandas is the concept of indexing . 大熊猫的一个有趣特征是索引的概念。 There are even more pythonic ways of doing this, but using loc , you can assign values to a section of the dataframe using series (columns): 还有更多的Python方式,但是使用loc ,您可以使用系列(列)将值分配给数据框的一部分:

order_data.loc[order_data['CurrencyPair'].isin(('GBP/USD', 'AUD/USD', 'EUR/USD')), 'CurrencyPair'] = order_data['DeskRate'] * order_data['OrderQty']
order_data.loc[order_data['CurrencyPair'] == 'USD/CHF', 'CurrencyPair'] = order_data['DeskRate'] / order_data['OrderQty']
order_data.loc[order_data['CurrencyPair'] == 'EUR/GBP', 'CurrencyPair'] = some_func(order_data['DeskRate'], order_data['OrderQty'])

Thus avoiding any for loops 从而避免任何for循环

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM