如何加速 Pandas Dataframe 上的 `for` 循环

Question

I have a Pandas Dataframe as shown below:我有一个 Pandas Dataframe 如下所示：

import pandas as pd
dist_temp = {'ID': {0: '110000', 1: '120000', 2: '130100', 3: '130200', 4: '130300'},
 '110000': {0: 0.0,
  1: 26359.35546663972,
  2: 7041.465835419961,
  3: 6748.88597016984,
  4: 4499.648549689056},
 '120000': {0: 25359.55546817345,
  1: 0.0,
  2: 2684.5583355637195,
  3: 16072.930316000879,
  4: 2573.60624992548},
 '130100': {0: 7203.651673447513,
  1: 2933.7008133762006,
  2: 0.0,
  3: 2278.794996954,
  4: 1022.7509126175601},
 '130200': {0: 7532.764214042125,
  1: 17598.85692679548,
  2: 2303.3769962313604,
  3: 0.0,
  4: 8356.33354580892},
 '130300': {0: 4123.111378129952,
  1: 2726.5461773558404,
  2: 1047.53770945992,
  3: 7946.337752637479,
  4: 0.0}}
df = pd.DataFrame(dist_temp)

Is a there a way to speed up the following routine:有没有办法加快以下例程：

df_1 = pd.DataFrame(columns = ['IDo', 'IdD', 'flux'])
for i in range(0,len(df.columns)-1):
    j =0
    for x in df.itertuples():
        df_1 = df_1.append({'IDo': df.columns[i+1], 'IdD': x[j+1],'flux': x[i+2]}, ignore_index = True)
    j = j+1
df_1['flux'] = df_1['flux'].fillna(0)
print (df_1)

Desired output is as shown above, where the title of the second column in this case (110000) comes in the output Dataframe as IDo in the rows (0-4) and the the first five rows (0-4) in the input Dataframe with value of 110000, 12000, 130100, 130200, 130300 comes in the output Dataframe as IdD (which is the second column in the output dataframe).所需的 output 如上所示，其中第二列的标题在这种情况下 (110000) 来自 output Dataframe 作为 IDo 在输入 Dataframe 的行 (0-4) 和前五行 (0-4)值为 110000、12000、130100、130200、130300 作为 IdD 出现在 output Dataframe（这是 output 数据帧中的第二列）。 And the third column in output Dataframe (flux) contributes to value corresponding to the intersecting points in the input Dataframe such as 0.000000 is the value on intersection when 110000 (title of column 1 input dataframe) and 110000 (number on second row), 26359.355467 is the value when 110000 (title of column 1 dataframe) and 120000 (number on second row) and so on. output Dataframe 中的第三列（通量）贡献对应于输入 Dataframe 中的交点的值，例如 0.000000 是 110000（第 1 列输入数据框的标题）和 110000（第二行的数字）2653467.35 的交点处的值是 110000（第 1 列数据框的标题）和 120000（第二行的数字）等时的值。

For small Dataframe this method is not a problem.对于小Dataframe这个方法没有问题。 But for a 500 rows x 500 columns Dataframe it is taking enormous time.但是对于500 rows x 500 columns Dataframe 来说，它需要花费大量时间。

Sorry for this naive question, I am new to Pandas.很抱歉这个幼稚的问题，我是 Pandas 的新手。

Thankyou谢谢

Answer 1

Pandas has lots of great options for reshaping a dataframe. Pandas 有很多重塑 dataframe 的好选择。

In this case, DataFrame.melt is handy:在这种情况下， DataFrame.melt很方便：

df.melt(id_vars="ID", value_name="Flux", var_name="IDo")

And for the first 10 rows I get:对于前 10 行，我得到：


       ID     IDo          Flux
0  110000  110000      0.000000
1  120000  110000  26359.355467
2  130100  110000   7041.465835
3  130200  110000   6748.885970
4  130300  110000   4499.648550
5  110000  120000  25359.555468
6  120000  120000      0.000000
7  130100  120000   2684.558336
8  130200  120000  16072.930316
9  130300  120000   2573.606250

如何加速 Pandas Dataframe 上的 `for` 循环

问题描述

1 个解决方案

解决方案1
1 已采纳 2022-03-09 19:33:10

如何加速 Pandas Dataframe 上的 `for` 循环

问题描述

1 个解决方案

解决方案1 1 已采纳 2022-03-09 19:33:10

解决方案1
1 已采纳 2022-03-09 19:33:10