[英]Performing an index match with Python pandas
I'm struggling write some code to obtain the following functionality:我正在努力编写一些代码来获得以下功能:
df1 df1
Date A B
01/01/2021 39 100
01/02/2021 58 188
01/03/2021 220 300
01/04/2021 0 11
df2 df2
Date A A A B B B
0 50 100 0 100 200
01/01/2021 0.1 0.2 0.3 0.3 0.3 0.6
01/02/2021 0.1 0.2 0.3 0.3 0.3 0.6
01/03/2021 0.3 0.3 0.6 0.5 0.4 0.8
01/04/2021 0.3 0.3 0.6 0.5 0.8 0.8
df3 (desired output) df3(所需输出)
Date A B
01/01/2021 (39*0.1) (100*0.3)
01/02/2021 (58*0.2) (188*0.3)
01/03/2021 (220*0.6) (300*0.8)
01/04/2021 (0*0.1) (11*0.5)
Effectively, I need to check the values for A and B in df1 and multiply with the corresponding value in df2 based on date and whether the value is between 0 and 50, 50 and 100 or >100 (in the case of A).实际上,我需要检查 df1 中 A 和 B 的值,并根据日期与 df2 中的相应值相乘,以及该值是否介于 0 和 50、50 和 100 或 >100 之间(在 A 的情况下)。
In reality, df1 and df2 extend far beyond 2 items 'A' and 'B' and I intend to iterate for each column of df1 in a for loop thus I am looking for a general solution.实际上,df1 和 df2 远远超出了 2 项“A”和“B”,我打算在 for 循环中迭代 df1 的每一列,因此我正在寻找一个通用的解决方案。
Thanks谢谢
Here is a way:这是一种方法:
def fun(x):
col_name = x.name
col_idx = df.columns.get_loc(col_name)
lower, middle, upper = df2.columns.get_loc_level(col_name)[1]
cond_list = [upper <= x, middle <= x, lower <= x]
choice_list = np.arange(3)[::-1] + 3 * col_idx
selections = np.select(cond_list, choice_list)
cols_in_df2 = df2.columns[selections]
rows_in_df2 = x.index
multipliers_in_df2 = np.diag(df2.loc[rows_in_df2, cols_in_df2])
result = x * multipliers_in_df2
return result
df.apply(fun)
to get要得到
A B
Date
2021-01-01 3.9 30.0
2021-01-02 11.6 56.4
2021-01-03 132.0 240.0
2021-01-04 0.0 5.5
We have a cond_list
of conditions whose limits are obtained from the corresponding values in df2
s column, and a choice_list
which is the corresponding columns in df2
eg for A
, choice_list
is [2, 1, 0]
and for B
, it is [5, 4, 3]
.我们有一个
cond_list
条件,其限制是从df2
s 列中的相应值获得的,以及一个choice_list
,它是df2
中的相应列,例如对于A
, choice_list
是[2, 1, 0]
,对于B
,它是[5, 4, 3]
。 We get the offset for column through 3 * col_idx
where 3
is the number of conditions.我们通过
3 * col_idx
获得列的偏移量,其中3
是条件数。
Then we perform a select
ion depending on these which gives what columns we should look for ( cols_in_df2
).然后我们执行
select
离子,这取决于这些给出我们应该寻找哪些列( cols_in_df2
)。 Rows to look for are the index
of the series, so we choose the multipliers_in_df2
via loc
with these.要查找的行是系列的
index
,因此我们通过loc
选择multipliers_in_df2
与这些。
Lastly we really multiply the series at hand with those multipliers and return.最后,我们将手头的系列乘以这些乘数并返回。
This process happens for each column with apply
.使用
apply
的每一列都会发生此过程。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.