简体   繁体   English

根据 pandas DataFrame 列中最接近的值更新 numpy 数组

[英]Update numpy array based on nearest value in pandas DataFrame column

How can I update an array based on the nearest value in a pandas DataFrame column?如何根据 pandas DataFrame 列中最接近的值更新数组? For example, I'd like to update the following array based on the "Time" column in the pandas DataFrame so that the array now contains the "X" values:例如,我想根据 pandas DataFrame 中的“时间”列更新以下数组,以便该数组现在包含“X”值:

Input array:输入数组:

a = np.array([
    [122.25, 225.00, 201.00],
    [125.00, 151.50, 160.62],
    [99.99, 142.25, 250.01],
])

Input DataFrame:输入 DataFrame:

df = pd.DataFrame({
    'Time': [100, 125, 150, 175, 200, 225],
    'X': [26100, 26200, 26300, 26000, 25900, 25800],
})

Expected output array:预期 output 数组:

([
    [26200, 25800, 25900],
    [26200, 26300, 26300],
    [26100, 26300, 25800],
])

Use merge_asof :使用merge_asof

# Convert Time to float since your input array is float.
# merge_asof requires both sides to have the same data types
df['Time'] = df['Time'].astype('float')

# merge_asof also requires both data frames to be sorted by the join key (Time)
# So we need to flatten the input array and make note of the original order
# before going into the merge
a_ = np.ravel(a)
o_ = np.arange(len(a_))

tmp = pd.DataFrame({
    'Time': a_,
    'Order': o_
})

# Merge the two data frames and extract X in the original order
result = (
    pd.merge_asof(tmp.sort_values('Time'), df.sort_values('Time'), on='Time', direction='nearest')
        .sort_values('Order')
        ['X'].to_numpy()
        .reshape(a.shape)
)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM