简体   繁体   English

使用 Python pandas 执行索引匹配

[英]Performing an index match with Python pandas

I'm struggling write some code to obtain the following functionality:我正在努力编写一些代码来获得以下功能:

df1 df1

Date          A      B
01/01/2021    39     100
01/02/2021    58     188
01/03/2021    220    300
01/04/2021    0      11

df2 df2

Date          A      A      A      B     B     B
              0      50     100    0     100   200
01/01/2021    0.1    0.2    0.3    0.3   0.3   0.6
01/02/2021    0.1    0.2    0.3    0.3   0.3   0.6
01/03/2021    0.3    0.3    0.6    0.5   0.4   0.8
01/04/2021    0.3    0.3    0.6    0.5   0.8   0.8

df3 (desired output) df3(所需输出)

Date          A           B
01/01/2021    (39*0.1)    (100*0.3)
01/02/2021    (58*0.2)    (188*0.3)
01/03/2021    (220*0.6)   (300*0.8)
01/04/2021    (0*0.1)     (11*0.5)

Effectively, I need to check the values for A and B in df1 and multiply with the corresponding value in df2 based on date and whether the value is between 0 and 50, 50 and 100 or >100 (in the case of A).实际上,我需要检查 df1 中 A 和 B 的值,并根据日期与 df2 中的相应值相乘,以及该值是否介于 0 和 50、50 和 100 或 >100 之间(在 A 的情况下)。

In reality, df1 and df2 extend far beyond 2 items 'A' and 'B' and I intend to iterate for each column of df1 in a for loop thus I am looking for a general solution.实际上,df1 和 df2 远远超出了 2 项“A”和“B”,我打算在 for 循环中迭代 df1 的每一列,因此我正在寻找一个通用的解决方案。

Thanks谢谢

Here is a way:这是一种方法:

def fun(x):
   col_name = x.name
   col_idx = df.columns.get_loc(col_name)
   
   lower, middle, upper = df2.columns.get_loc_level(col_name)[1]
   cond_list = [upper <= x, middle <= x, lower <= x]
   choice_list =  np.arange(3)[::-1] + 3 * col_idx
   selections = np.select(cond_list, choice_list)

   cols_in_df2 = df2.columns[selections]
   rows_in_df2 = x.index
   multipliers_in_df2 = np.diag(df2.loc[rows_in_df2, cols_in_df2])
   
   result = x * multipliers_in_df2
   return result

df.apply(fun)

to get要得到

                A      B
Date
2021-01-01    3.9   30.0
2021-01-02   11.6   56.4
2021-01-03  132.0  240.0
2021-01-04    0.0    5.5

We have a cond_list of conditions whose limits are obtained from the corresponding values in df2 s column, and a choice_list which is the corresponding columns in df2 eg for A , choice_list is [2, 1, 0] and for B , it is [5, 4, 3] .我们有一个cond_list条件,其限制是从df2 s 列中的相应值获得的,以及一个choice_list ,它是df2中的相应列,例如对于Achoice_list[2, 1, 0] ,对于B ,它是[5, 4, 3] We get the offset for column through 3 * col_idx where 3 is the number of conditions.我们通过3 * col_idx获得列的偏移量,其中3是条件数。

Then we perform a select ion depending on these which gives what columns we should look for ( cols_in_df2 ).然后我们执行select离子,这取决于这些给出我们应该寻找哪些列( cols_in_df2 )。 Rows to look for are the index of the series, so we choose the multipliers_in_df2 via loc with these.要查找的行是系列的index ,因此我们通过loc选择multipliers_in_df2与这些。

Lastly we really multiply the series at hand with those multipliers and return.最后,我们将手头的系列乘以这些乘数并返回。

This process happens for each column with apply .使用apply的每一列都会发生此过程。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM