繁体   English   中英

从多个列的另一个数据帧列中减去一个数据帧列

[英]Subtracting one dataframe column from another dataframe column for multiple columns

我试图从另一个数据框的列中减去一个数据框中的列,我想为n个列做这个(目前我正在处理的数据帧有1000列)。

以下是两个数据帧的外观:

数据帧1:

               branch A (pkg XYZ) | branch A (pkg ABC)| branch B (pkg XYZ)
               -------------------------------------------------------------
5/21 - 5/27           20          |        30         |        50
5/28 - 6/02           10          |        30         |        50
6/03 - 6/09           30          |        40         |        50
6/10 - 6/16           20          |        30         |        50
6/17 - 6/23           50          |        10         |        50

数据帧2:

                branch A (pkg XYZ)| branch A (pkg ABC) | branch B (pkg XYZ)
                -----------------------------------------------------------
5/21 - 5/27           3           |        5           |        50
5/28 - 6/02           2           |        6           |        50
6/03 - 6/09           3           |        7           |        50
6/10 - 6/16           1           |        2           |        50
6/17 - 6/23           4           |        0           |        50

如果我想从Dataframe 1中减去Dataframe 2的所有列,基于它们的列标题(“ 分支A(pkg XYZ) ”)是否匹配,那么最有效的方法是什么?

我尝试迭代一个数据帧列的列表,然后从另一个数据帧中减去它,但这似乎效率低下。

  i = 0
  df1_cols = list(df1)
  while i < len(df1.columns):
       col_name = df1_cols[i]

       # df3 is an empty dataframe
       df3[col_name] = df1[col_name] - df2[col_name]

       i += 1

您可以使用Dataframe.subtract来减去两个数据帧中的列。 我们在df2循环遍历列,如果在df1找到该列,我们在该列中执行减法。 最后,我们将结果保存在一个单独的列中,该列的名称以“Result”结尾。

In [1]: import pandas as pd

In [2]: df1 = pd.DataFrame({"branch A(pkg XYZ)":[20,10,30,20,50], "branch A(pkg ABC)":[30,30,40,30,10], "branch B(pkg X
   ...: YZ)":[50, 50, 50, 50, 50]})

In [3]: df1
Out[3]:
   branch A(pkg XYZ)  branch A(pkg ABC)  branch B(pkg XYZ)
0                 20                 30                 50
1                 10                 30                 50
2                 30                 40                 50
3                 20                 30                 50
4                 50                 10                 50

In [4]: df2 = pd.DataFrame({"branch A(pkg XYZ)":[3,2,3,1,4], "branch A(pkg ABC)":[5,6,7,2,0], "branch B(pkg XYZ)":[50,5
   ...: 0,50,50,50]})

In [5]: df2
Out[5]:
   branch A(pkg XYZ)  branch A(pkg ABC)  branch B(pkg XYZ)
0                  3                  5                 50
1                  2                  6                 50
2                  3                  7                 50
3                  1                  2                 50
4                  4                  0                 50

In [25]: for i in df2.columns:
    ...:     if i in df1.columns:
    ...:         df2[i+"Result"] = df2[i].subtract(df1[i], fill_value=0)

In [29]: df2
Out[29]:
   branch A(pkg XYZ)  branch A(pkg ABC)  branch B(pkg XYZ)  \
0                  3                  5                 50
1                  2                  6                 50
2                  3                  7                 50
3                  1                  2                 50
4                  4                  0                 50

   branch A(pkg XYZ)Result  branch A(pkg ABC)Result  branch B(pkg XYZ)Result
0                      -17                      -25                        0
1                       -8                      -24                        0
2                      -27                      -33                        0
3                      -19                      -28                        0
4                      -46                      -10                        0

1000列和100行的尝试也非常有效:

In [40]: import numpy as np
In [41]: df1 = pd.DataFrame(np.random.random((100, 1000)))
In [42]: df2 = pd.DataFrame(np.random.random((100, 1000)))
In [45]: %%timeit
    ...: for i in df2.columns:
    ...:     if i in df1.columns:
    ...:         df2[str(i)+"Result"] = df2[i].subtract(df1[i], fill_value=0)
    ...:
    ...:
367 ms ± 97.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [49]: df2.head(5)
Out[49]:
          0         1         2         3         4         5         6  \
0  0.327470  0.272503  0.549897  0.119997  0.985847  0.445402  0.582878
1  0.752375  0.606053  0.223085  0.062001  0.025440  0.638872  0.188112
2  0.174401  0.944870  0.630128  0.715326  0.298661  0.285740  0.360253
3  0.095649  0.355365  0.523830  0.114555  0.342535  0.393107  0.246344
4  0.250579  0.105054  0.761075  0.574047  0.733976  0.199406  0.658025

          7         8         9  ...  990Result  991Result  992Result  \
0  0.335388  0.613710  0.104878  ...  -0.728738   0.147162  -0.841872
1  0.796243  0.709898  0.133040  ...  -0.151361  -0.400989   0.012670
2  0.009304  0.472587  0.108229  ...  -0.131590  -0.540945  -0.097455
3  0.798668  0.628953  0.701703  ...  -0.461036   0.217387  -0.363704
4  0.387475  0.152143  0.825989  ...  -0.021844   0.103296  -0.272207

   993Result  994Result  995Result  996Result  997Result  998Result  999Result
0   0.389068   0.470042   0.556146   0.705036  -0.021659   0.250586  -0.662487
1  -0.456462  -0.206587   0.691951  -0.507585  -0.430838  -0.126303  -0.001411
2  -0.018339   0.226750   0.483076  -0.581611  -0.362906   0.796857  -0.367914
3   0.323971  -0.779884  -0.306404  -0.825982  -0.065974  -0.109321  -0.023654
4   0.178328   0.600110   0.222539   0.064416  -0.110039  -0.615137  -0.261765

[5 rows x 2000 columns]

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM