熊猫：从另一个数据框中找到最接近的组

Question

Below, I have two dataframe.下面，我有两个数据框。 I need to update df_mapped using df_original.我需要使用 df_original 更新 df_mapped。 In df_mapped, For each x_time need to find 3 closest rows (closest defined from difference from x_price) and add those to df_mapped dataframe.在 df_mapped 中，对于每个 x_time 需要找到 3 个最接近的行（根据与 x_price 的差异定义的最接近行）并将它们添加到 df_mapped 数据帧。

import io
import pandas as pd

d = """
x_time    expiration    x_price    p_price
 60          4           10                  20
 60          5           11                  30
 60          6           12                  40
 60          7           13                  50
 60          8           14                  60
 70          5           10                  20
 70          6           11                  30
 70          7           12                  40
 70          8           13                  50
 70          9           14                  60
 80          1           10                  20
 80          2           11                  30
 80          3           12                  40
 80          4           13                  50
 80          5           14                  60
"""

df_original = pd.read_csv(io.StringIO(d), delim_whitespace=True)`

to_mapped = """
x_time    expiration    x_price
 50          4          15
 60          5          15
 70          6          13
 80          7          20
 90          8          20
"""

df_mapped = pd.read_csv(io.StringIO(to_mapped), delim_whitespace=True)

df_mapped = df_mapped.merge(df_original, on='x_time', how='left')
df_mapped['x_price_delta'] = abs(df_mapped['x_price_x'] - df_mapped['x_price_y'])`

**Intermediate output: In this, need to select 3 min x_price_delta row for each x_time ** **中间输出：在此，需要为每个 x_time 选择 3 min x_price_delta 行 **

int_out = """    
x_time  expiration_x    x_price_x   expiration_y    x_price_y   p_price x_price_delta
50  4   15              
60  5   15  6   12  40  3
60  5   15  7   13  50  2
60  5   15  8   14  60  1
70  6   13  7   12  40  1
70  6   13  8   13  50  0
70  6   13  9   14  60  1
80  7   20  3   12  40  8
80  7   20  4   13  50  7
80  7   20  5   14  60  6
90  8   20              
"""
df_int_out = pd.read_csv(io.StringIO(int_out), delim_whitespace=True)

**Final step: keeping x_time fixed need to flatten the dataframe so we get the 3 closest row in one row ** **最后一步：保持 x_time 固定需要展平数据帧，以便我们在一行中获得最近的 3 行 **

final_out = """
x_time  expiration_original x_price_original    expiration_1    x_price_1   p_price_1   expiration_2    x_price_2   p_price_2   expiration_3    x_price_3   p_price_3
50  4   15                                  
60  5   15  6   12  40  7   13  50  8   14  60
70  6   13  7   12  40  8   13  50  9   14  60
80  7   20  3   12  40  4   13  50  5   14  60
90  8   20                                  
"""
df_out = pd.read_csv(io.StringIO(final_out), delim_whitespace=True)

I am stuck in between intermediate and last step.我被困在中间和最后一步之间。 Can't think of way out, what could be done to massage the dataframe?想不出出路，可以做些什么来按摩数据框？

Answer 1

This is not complete solution but it might help you to get unstuck.这不是完整的解决方案，但它可能会帮助您摆脱困境。

At the end we get the correct data.最后我们得到正确的数据。

In [1]: df = df_int_out.groupby("x_time").apply(lambda x: x.sort_values(ascen
     ...: ding=False, by="x_price_delta")).set_index(["x_time", "expiration_x"]
     ...: ).drop(["x_price_delta", "x_price_x"],axis=1)

In [2]: df1 = df.iloc[1:-1]

In [3]: df1.groupby(df1.index).apply(lambda x: pd.concat([pd.DataFrame(d) for
     ...:  d in x.values],axis=1).unstack())
Out[3]:
           0
           0     1     2    0     1     2    0     1     2
(60, 5)  6.0  12.0  40.0  7.0  13.0  50.0  8.0  14.0  60.0
(70, 6)  7.0  12.0  40.0  9.0  14.0  60.0  8.0  13.0  50.0
(80, 7)  3.0  12.0  40.0  4.0  13.0  50.0  5.0  14.0  60.0

I am sure there are much better ways of handling this case.我相信有更好的方法来处理这种情况。

熊猫：从另一个数据框中找到最接近的组

问题描述

1 个解决方案

解决方案1
0 2022-12-17 07:34:10

熊猫：从另一个数据框中找到最接近的组

问题描述

1 个解决方案

解决方案1 0 2022-12-17 07:34:10

解决方案1
0 2022-12-17 07:34:10