pandas 中 .apply() 的更快替代方案

Question

我正在嘗試加快將自定義 function 應用於數據框中的列的過程。 我發現這是：

b = b.apply(lambda x: 'not_ticker' if x is None else x)
b = b.apply(lambda x: x if x=='not_ticker' else x if isTICKER(x) else 'not_ticker')

比這快得多：

b = b.apply(lambda x: x if isTICKER(x) else 'not_ticker')

在哪里：

b
Out[25]: 
0     None
1     None
2     None
3     None
4     None
5     None
6     None
7     None
8     SOLD
9     None
10    NVAX
11      GM
12    None
13    None
Name: tickers_body_3, dtype: object

如果傳遞給它的字符串是有效股票，我的 function isTICKER() 返回 True

def isTICKER(ticker_in):
    """
    isTICKER(ticker)
    Arguments:
       ticker_in(str): string to be verified as ticker

    Returns:
        isticker(boolean): positive if item is a valid ticker in yfinance database
    """
    import yfinance as yf
    if ticker_in is not None:
        if len(ticker_in)>2:
            ticker = yf.Ticker(str(ticker_in))
            info = None
            if ticker.info['regularMarketPrice'] is None:
                return False
            else:
                return True
        else:
            return False
    else:
        return False

不幸的是，這仍然非常慢，並且需要在比給出的示例更大的數據集上運行。 最終的 output 應如下所示：

b = b.apply(lambda x: 'not_ticker' if x is None else x)
b = b.apply(lambda x: x if x=='not_ticker' else x if isTICKER(x) else 'not_ticker')
print(b)
0     not_ticker
1     not_ticker
2     not_ticker
3     not_ticker
4     not_ticker
5     not_ticker
6     not_ticker
7     not_ticker
8     not_ticker
9     not_ticker
10          NVAX
11    not_ticker
12    not_ticker
13    not_ticker
Name: tickers_body_3, dtype: object

Answer 1

而不是分別為每個值調用IsTICKER ，您可以為每個唯一值調用一次，並將結果保存為字典：

dict_res = {x: isTICKER(x) for x in np.unique(b.values)}`

此行在每個唯一值與 function 的結果之間創建映射。 然后，您只需要根據您的字典替換這些值。 這可以通過.map function 輕松完成：

b = b.map(dict_res)

與您的代碼相比，這里有兩個“黑客”可以提高性能：

僅在唯一值而不是整個系列上調用您的自定義 function （可能很慢）。 如果您的系列中有經常重復的值，則很有用。
使用.map而不是.apply

pandas 中 .apply() 的更快替代方案

問題描述

1 個解決方案

解決方案1
0 2021-12-25 18:08:55

pandas 中 .apply() 的更快替代方案

問題描述

1 個解決方案

解決方案1 0 2021-12-25 18:08:55

解決方案1
0 2021-12-25 18:08:55