是否有一個術語可以找到近似曲線的最小 N 點集？

Question

我花了一些時間回答我如何離散化連續的 function 以避免產生噪音（見圖），並且自始至終，我覺得我在重新發明一輛自行車。

本質上，問題是：

您將獲得一條曲線 function - 對於任何x ，您可以獲得y 。
您想使用分段線性 function 來近似曲線，該曲線具有精確的N個點，基於一些誤差度量，例如到曲線的距離，或最小化曲線下面積的絕對差異（感謝 @QuangHoang 指出這些是不同的）。

這是我使用 20 個點近似的曲線示例：

問題：我已經使用重復的二分法對其進行了編碼。 有我可以使用的圖書館嗎？ 有沒有一個我沒能用谷歌搜索出來的問題類型的好詞？ 這是否可以推廣到更廣泛的問題集？

編輯：根據要求，我是這樣做的： Google Colab

數據：

import numpy as np
from scipy.signal import gaussian

N_MOCK = 2000

# A nice-ish mock distribution
xs = np.linspace(-10.0, 10.0, num=N_MOCK)
sigmoid = 1 / (1 + np.exp(-xs))
gauss = gaussian(N_MOCK, std=N_MOCK / 10)
ys = gauss - sigmoid + 1
xs += 10
xs /= 20

繪圖：

import matplotlib.pyplot as plt


def plot_graph(cont_time, cont_array, disc_time, disc_array, plot_name):
    """A simplified version of the provided plotting function"""
    
    # Setting Axis properties and titles
    fig, ax = plt.subplots(figsize=(20, 4))
    ax.set_title(plot_name)

    # Plotting stuff
    ax.plot(cont_time, cont_array, label="Continuous", color='#0000ff')
    ax.plot(disc_time, disc_array, label="Discrete",   color='#00ff00')

    fig.legend(loc="upper left", bbox_to_anchor=(0,1), bbox_transform=ax.transAxes)

這是我解決它的方法，但我希望有一個更標准的方法：

import warnings
warnings.simplefilter('ignore', np.RankWarning)


def line_error(x0, y0, x1, y1, ideal_line, integral_points=100):
    """Assume a straight line between (x0,y0)->(x1,p1). Then sample the perfect line multiple times and compute the distance."""
    straight_line = np.poly1d(np.polyfit([x0, x1], [y0, y1], 1))
    xs = np.linspace(x0, x1, num=integral_points)
    ys = straight_line(xs)

    perfect_ys = ideal_line(xs)
    
    err = np.abs(ys - perfect_ys).sum() / integral_points * (x1 - x0)  # Remove (x1 - x0) to only look at avg errors
    return err


def discretize_bisect(xs, ys, bin_count):
    """Returns xs and ys of discrete points"""
    # For a large number of datapoints, without loss of generality you can treat xs and ys as bin edges
    # If it gives bad results, you can edges in many ways, e.g. with np.polyline or np.histogram_bin_edges
    ideal_line = np.poly1d(np.polyfit(xs, ys, 50))
    
    new_xs = [xs[0], xs[-1]]
    new_ys = [ys[0], ys[-1]]
    
    while len(new_xs) < bin_count:
        
        errors = []
        for i in range(len(new_xs)-1):
            err = line_error(new_xs[i], new_ys[i], new_xs[i+1], new_ys[i+1], ideal_line)
            errors.append(err)

        max_segment_id = np.argmax(errors)
        new_x = (new_xs[max_segment_id] + new_xs[max_segment_id+1]) / 2
        new_y = ideal_line(new_x)
        new_xs.insert(max_segment_id+1, new_x)
        new_ys.insert(max_segment_id+1, new_y)

    return new_xs, new_ys

跑：

BIN_COUNT = 25

new_xs, new_ys = discretize_bisect(xs, ys, BIN_COUNT)

plot_graph(xs, ys, new_xs, new_ys, f"Discretized and Continuous comparison, N(cont) = {N_MOCK}, N(disc) = {BIN_COUNT}")
print("Bin count:", len(new_xs))

注意：雖然我更喜歡numpy ，但答案可以是任何語言的庫，也可以是數學術語的名稱。 請不要寫很多代碼，因為我自己已經這樣做了:)

Answer 1

有沒有一個我沒能用谷歌搜索出來的問題類型的好詞？ 這是否可以推廣到更廣泛的問題集？

我將這個問題稱為預期改進(EI) 或貝葉斯優化（ archive.org 上的永久鏈接）。 給定一個昂貴的黑匣子 function，您想為其找到全局最大值，該算法產生下一個 position 來檢查該最大值。

乍一看，這與您的問題不同。 您正在尋找一種用少量樣本逼近曲線的方法，而 EI 提供了 function 最有可能達到最大值的位置。 但是這兩個問題是等價的，只要你用盡可能少的點最小化錯誤 function （當你在近似值中添加另一個樣本時它會改變）。

我相信這是原始研究論文。

Jones, Donald & Schonlau, Matthias & Welch, William。 （1998 年）。 昂貴的黑盒函數的高效全局優化。 全球優化雜志。 13. 455-492。 10.1023/A:1008306431147。

從第 1 節開始：

[...] 在所有競爭方法中，該技術通常需要最少的 function 評估。 這是可能的，因為使用典型的工程函數，人們通常可以在設計空間的大距離內非常准確地進行插值和外插。 直觀地說，該方法能夠“看到”數據中明顯的趨勢或模式並“得出結論”，而不必沿着某個軌跡一步一步地移動。

至於為什么它是有效的：

[...] 響應面方法基於進一步搜索的預期改進提供了可靠的停止規則。 這樣的停止規則是可能的，因為統計 model 提供了函數值在未采樣點的置信區間——並且這些置信區間的“合理性”可以通過 model 驗證技術進行檢查。

是否有一個術語可以找到近似曲線的最小 N 點集？

問題描述

1 個解決方案

解決方案1
0 2022-01-18 15:33:18

是否有一個術語可以找到近似曲線的最小 N 點集？

問題描述

1 個解決方案

解決方案1 0 2022-01-18 15:33:18

解決方案1
0 2022-01-18 15:33:18