如何加快 python DataFrame 中的嵌套循环？

Question

I have a pandas.DataFrame containing information of different securities.我有一个 pandas.DataFrame 包含不同证券的信息。 There are columns: "date", "security_id", "country", "factor_name" and "factor_value", where "factor_name" indicates whether the "factor_value" is "debt" or "equity".有列：“date”、“security_id”、“country”、“factor_name”和“factor_value”，其中“factor_name”表示“factor_value”是“debt”还是“equity”。 I am asked to calculate the debt-to-equity ratio for each security at each country at each date.我被要求计算每个国家每个证券在每个日期的债务权益比率。 I can only think of using a nested loop to loop through the unique values of each columns, but it seems to take forever to run.我只能想到使用嵌套循环来遍历每列的唯一值，但它似乎需要永远运行。 Is there any way I can speed up my code?有什么办法可以加快我的代码？

dates = data["date"].unique()
securities = data["security_id"].unique()
countries = data["country"].unique()
for date in dates:
    for sec in securities:
        for country in countries:
            ratio = get_DEratio(date, sec, country)

def get_DEratio(date, sec, country):
    TE_lst = data[(data["date"] == date) & (data["security_id"] == sec) 
              & (data["country"] == country) & (data["factor"] == "TE")]["factor_value"].tolist()
    TD_lst = data[(data["date"] == date) & (data["security_id"] == sec)
              & (data["country"] == country) & (data["factor"] == "TD")]["factor_value"].tolist()
    
    if not TD_lst or not TE_lst:
        return 0
    
    TD, TE = TD_lst[0], TE_lst[0]
    if TD == 0 or TE == 0:
        return 0
    return TD / TE

Answer 1

Assume that your source DataFrame contains:假设您的源 DataFrame 包含：

        date security_id country factor_name  factor_value
0 2020-06-01          S1      C1          TE          10.0
1 2020-06-01          S1      C1          TD          20.0
2 2020-06-01          S2      C1          TE          12.0
3 2020-06-01          S2      C1          TD          20.0
4 2020-06-01          S1      C2          TE          12.0
5 2020-06-01          S1      C2          TD          20.0
6 2020-06-01          S2      C2          TE          14.0
7 2020-06-01          S2      C2          TD          20.0
8 2020-06-01          S3      C2          TE          14.0
9 2020-06-01          S4      C2          TD          20.0

First compute an auxiliary DataFrame:首先计算一个辅助 DataFrame：

wrk = df.set_index(['date', 'security_id', 'country', 'factor_name'])\
    .factor_value.unstack()

The result is:结果是：

factor_name                       TD    TE
date       security_id country            
2020-06-01 S1          C1       20.0  10.0
                       C2       20.0  12.0
           S2          C1       20.0  12.0
                       C2       20.0  14.0
           S3          C2        NaN  14.0
           S4          C2       20.0   NaN

Then, to get the final result, run:然后，要获得最终结果，请运行：

result = wrk.TD.div(wrk.TE).fillna(0)

and you will get:你会得到：

date        security_id  country
2020-06-01  S1           C1         2.000000
                         C2         1.666667
            S2           C1         1.666667
                         C2         1.428571
            S3           C2         0.000000
            S4           C2         0.000000
dtype: float64

如何加快 python DataFrame 中的嵌套循环？

问题描述

1 个解决方案

解决方案1
0 已采纳 2021-01-08 10:19:36

如何加快 python DataFrame 中的嵌套循环？

问题描述

1 个解决方案

解决方案1 0 已采纳 2021-01-08 10:19:36

解决方案1
0 已采纳 2021-01-08 10:19:36