[英]How to apply a function to specific columns of a pandas dataframe?
我想將 function 應用於 pandas 數據框的特定列。 這是一個例子:
# import modules
from pandas_datareader import data as pdr
# import parameters
start = "2020-01-01"
end = "2021-01-01"
symbols = ["AAPL"]
# get the data
data = pdr.get_data_yahoo(symbols, start, end)
def mult(row):
return row['Close']*2, row['Open']/3
data[['Close', 'Open']].apply(mult, axis = 1)
print(data.head())
結果:
Attributes Adj Close Close High Low Open Volume
Symbols AAPL AAPL AAPL AAPL AAPL AAPL
Date
2020-01-02 73.894333 75.087502 75.150002 73.797501 74.059998 135480400.0
2020-01-03 73.175926 74.357498 75.144997 74.125000 74.287498 146322800.0
2020-01-06 73.759003 74.949997 74.989998 73.187500 73.447502 118387200.0
2020-01-07 73.412109 74.597504 75.224998 74.370003 74.959999 108872000.0
2020-01-08 74.593048 75.797501 76.110001 74.290001 74.290001 132079200.0
關於為什么這不起作用的任何想法?
我認為問題在於您沒有將mult
函數的返回分配給任何變量。
實現你想要的一種方法是:
# import modules
from pandas_datareader import data as pdr
# import parameters
start = "2020-01-01"
end = "2021-01-01"
symbols = ["AAPL"]
# get the data
data = pdr.get_data_yahoo(symbols, start, end)
def mult(df):
df['Close'] = 2 * df['Close']
df['Open'] = df['Open'] / 3
return df
mult(data)
print(data.head())
Attributes Adj Close Close High Low Open \
Symbols AAPL AAPL AAPL AAPL AAPL
Date
2020-01-02 73.894325 150.175003 75.150002 73.797501 24.686666
2020-01-03 73.175926 148.714996 75.144997 74.125000 24.762499
2020-01-06 73.759010 149.899994 74.989998 73.187500 24.482501
2020-01-07 73.412117 149.195007 75.224998 74.370003 24.986666
2020-01-08 74.593048 151.595001 76.110001 74.290001 24.763334
兩件事情:
(i) 你永遠不會將它分配回原來的 DataFrame,所以它永遠不會更新。
(ii) 如果您的 function 不再復雜,對於簡單的乘法,向量化運算更好,因此直接在列上執行乘法而不是 function:
data['Close'] *= 2
data['Open'] /= 3
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.