如何权衡散点图中的点？

Question

So, I looked up information about the weights parameter in the polyfit (numpy.polynomial.polynomial.polyfit) function in Python and it seems like it has something to do with the error associated with the individual points. 所以，我在Python中的polyfit（numpy.polynomial.polynomial.polyfit）函数中查找了有关weights参数的信息，看起来它与与各个点相关的错误有关。 ( How to include measurement errors in numpy.polyfit ) （如何在numpy.polyfit中包含测量误差）

However, what I am trying to do has nothing to do with the error, but weights. 但是，我想要做的与错误无关，而是权重。 I have an image in the form of a numpy array which indicates the amount of charge deposited in the detector. 我有一个numpy阵列形式的图像，表明探测器中沉积的电荷量。 I convert that image to a scatter plot and then do a fit. 我将该图像转换为散点图，然后进行拟合。 But I want that fit to give more weight to the points which have more charge deposited and less to the ones that have less charge. 但我希望这适合给予更多电荷沉积的点更多的权重，而不是那些电荷更少的点。 Is that what the weights parameter is for? 这是权重参数的用途吗？

Here's an example image: 这是一个示例图像： Here's my code: 这是我的代码：

def get_best_fit(image_array, fixedX, fixedY):
    weights = np.array(image_array)
    x = np.where(weights>0)[1]
    y = np.where(weights>0)[0]
    size = len(image_array) * len(image_array[0])
    y = np.zeros((len(image_array), len(image_array[0])))
    for i in range(len(np.where(weights>0)[0])):
        y[np.where(weights>0)[0][i]][np.where(weights>0)[1][i]] = np.where(weights>0)[0][i]
    y = y.reshape(size)
    x = np.array(range(len(image_array)) * len(image_array[0]))
    weights = weights.reshape((size))
    b, m = polyfit(x, y, 1, w=weights)
    angle = math.atan(m) * 180/math.pi
    return b, m, angle

Let me explain to you the code: 让我向您解释一下代码：

The first line assigns the charged deposited in a variable called weights. 第一行将指定的电荷分配给称为权重的变量。 The next two lines get the points where the charge deposited is >0, so there's some charge deposited to capture the coordinates for the scatter plot. 接下来的两行得到沉积电荷> 0的点，因此存在一些电荷以捕获散射图的坐标。 Then I get the size of the entire image to later convert to just a one dimensional array for plotting. 然后我得到整个图像的大小，以便稍后转换为一维数组进行绘图。 I then go through the image and try to get the coordinates of the points where there's some charge deposited (remember that the amount of charge is stored in the variable weights ). 然后我去通过图像，并试图让那里的一些电荷堆积的点的坐标（记住，电荷量被存储在变量weights ）。 I then reshape the y coordinates to get a one dimensional array and get the x coordinates for all the corresponding y coordinates from the image, then change the shape of the weights too to be just one dimensional. 然后我重塑y坐标以获得一维数组，并从图像中获取所有相应y坐标的x坐标，然后将权重的形状也改为一维。

Edit: if there's a way of doing this using the np.linalg.lstsq function, that would be ideal since I'm also trying to get the fit to go through the vertex of the plot. 编辑：如果有一种方法可以使用np.linalg.lstsq函数执行此操作，那将是理想的，因为我还试图通过绘图的顶点进行拟合。 I could just reposition the plot so the vertex is at zero and then use np.linalg.lstsq , but that wouldn't allow me to use the weights. 我可以重新定位图，使顶点为零，然后使用np.linalg.lstsq ，但这不允许我使用权重。

Answer 1

You can use sklearn.linear_model.LinearRegression . 您可以使用sklearn.linear_model.LinearRegression 。 It allows you to not fit the intercept (ie line goes through the origin, or, with some finagling, the point of your choice). 它允许你不适合截距（即线穿过原点，或者，通过一些重复，你选择的点）。 It also deals with weighted data. 它还处理加权数据。

eg (mostly stolen shamelessly from @Hiho's answer) 例如（大部分是从@Hhoho的回答中无耻地偷走）

import numpy as np
import matplotlib.pyplot as plt
import sklearn.linear_model

y = np.array([1.0, 3.3, 2.2, 4.25, 4.8, 5.1, 6.3, 7.5])
x = np.arange(y.shape[0]).reshape((-1,1))
w = np.linspace(1,5,y.shape[0])

model = sklearn.linear_model.LinearRegression(fit_intercept=False)
model.fit(x, y, sample_weight=w)

line_x = np.linspace(min(x), max(x), 100).reshape((-1,1))
pred = model.predict(line_x)

plt.scatter(x, y)
plt.plot(line_x, pred)

plt.show()

Answer 2

So I might be misunderstanding the problem, but I just tried to fit a straight line to a scatter plot and then change the fit to prioritise specific points using the weights parameter. 所以我可能误解了这个问题，但我只是尝试在散点图中插入一条直线，然后使用weights参数更改拟合以确定特定点的优先级。
I tried this with np.polyfit and np.polynomial.polynomial.polyfit , I would have expected them both to behave the same as they are both minimising squared error (at least thats my understanding). 我用np.polyfit和np.polynomial.polynomial.polyfit尝试了这个，我原本预计它们的行为都会相同，因为它们都会减小平方误差（至少这是我的理解）。
However the fits were quite different, see below. 然而，拟合是完全不同的，见下文。 Not quite sure what to make of that. 不太确定如何做到这一点。

Code 码

import numpy as np
import matplotlib.pyplot as plt

def func(p1, p2, x):
    return  p1 * x + p2

y = np.array([1.0, 3.3, 2.2, 4.25, 4.8, 5.1, 6.3, 7.5])
x = np.arange(y.shape[0])

plt.scatter(x, y)

w = np.ones(x.shape[0])
w[1] = 12
# p1, p2 = np.polyfit(x, y, 1, w=w)
p1, p2 = np.polynomial.polynomial.polyfit(x, y, 1, w=w)
print(p1, p2, w)

plt.plot(x, func(p1, p2, x))

plt.show()

np.polyfit np.polyfit

With no weights (or all set 1) 没有重量（或全部设置1）

With the weight of the 2nd point set to 12, all other weights are 1 当第二点的权重设置为12时，所有其他权重均为1

np.polynomial.polynomial.polyfit np.polynomial.polynomial.polyfit

No weights 没有重量

With the weight of the 2nd point set to 12, all other weights are 1 当第二点的权重设置为12时，所有其他权重均为1

So np.polyfit behaves as I would expect, however I don't really know whats going on with np.polynomial.polynomial.polyfit even the fit without any weights doesn't make any sense to me. 所以np.polyfit表现得像我期望的那样，但是我真的不知道np.polynomial.polynomial.polyfit正在进行什么，即使没有任何重量的拟合也没有任何意义。
But I think np.polyfit does what you are after? 但是我觉得np.polyfit能做到你想要的吗？ Changing the weight parameter clearly gives more weight to higher weighted points. 明显改变重量参数会给更高的加权点带来更多的权重。

如何权衡散点图中的点？

问题描述

2 个解决方案

解决方案1
6 已采纳 2018-08-07 17:08:52

解决方案2
5 2018-08-04 21:47:21

Code 码

np.polyfit np.polyfit

np.polynomial.polynomial.polyfit np.polynomial.polynomial.polyfit

如何权衡散点图中的点？

问题描述

2 个解决方案

解决方案1 6 已采纳 2018-08-07 17:08:52

解决方案2 5 2018-08-04 21:47:21

Code 码

np.polyfit np.polyfit

np.polynomial.polynomial.polyfit np.polynomial.polynomial.polyfit

解决方案1
6 已采纳 2018-08-07 17:08:52

解决方案2
5 2018-08-04 21:47:21