[英]How to speed up nested loop in python DataFrame?
我有一個 pandas.DataFrame 包含不同證券的信息。 有列:“date”、“security_id”、“country”、“factor_name”和“factor_value”,其中“factor_name”表示“factor_value”是“debt”還是“equity”。 我被要求計算每個國家每個證券在每個日期的債務權益比率。 我只能想到使用嵌套循環來遍歷每列的唯一值,但它似乎需要永遠運行。 有什么辦法可以加快我的代碼?
dates = data["date"].unique()
securities = data["security_id"].unique()
countries = data["country"].unique()
for date in dates:
for sec in securities:
for country in countries:
ratio = get_DEratio(date, sec, country)
def get_DEratio(date, sec, country):
TE_lst = data[(data["date"] == date) & (data["security_id"] == sec)
& (data["country"] == country) & (data["factor"] == "TE")]["factor_value"].tolist()
TD_lst = data[(data["date"] == date) & (data["security_id"] == sec)
& (data["country"] == country) & (data["factor"] == "TD")]["factor_value"].tolist()
if not TD_lst or not TE_lst:
return 0
TD, TE = TD_lst[0], TE_lst[0]
if TD == 0 or TE == 0:
return 0
return TD / TE
假設您的源 DataFrame 包含:
date security_id country factor_name factor_value
0 2020-06-01 S1 C1 TE 10.0
1 2020-06-01 S1 C1 TD 20.0
2 2020-06-01 S2 C1 TE 12.0
3 2020-06-01 S2 C1 TD 20.0
4 2020-06-01 S1 C2 TE 12.0
5 2020-06-01 S1 C2 TD 20.0
6 2020-06-01 S2 C2 TE 14.0
7 2020-06-01 S2 C2 TD 20.0
8 2020-06-01 S3 C2 TE 14.0
9 2020-06-01 S4 C2 TD 20.0
首先計算一個輔助 DataFrame:
wrk = df.set_index(['date', 'security_id', 'country', 'factor_name'])\
.factor_value.unstack()
結果是:
factor_name TD TE
date security_id country
2020-06-01 S1 C1 20.0 10.0
C2 20.0 12.0
S2 C1 20.0 12.0
C2 20.0 14.0
S3 C2 NaN 14.0
S4 C2 20.0 NaN
然后,要獲得最終結果,請運行:
result = wrk.TD.div(wrk.TE).fillna(0)
你會得到:
date security_id country
2020-06-01 S1 C1 2.000000
C2 1.666667
S2 C1 1.666667
C2 1.428571
S3 C2 0.000000
S4 C2 0.000000
dtype: float64
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.