简体   繁体   English

通过 3D x,y,z 散布 plot 数据拟合一条线

[英]Fitting a line through 3D x,y,z scatter plot data

I have a handful of data points that cluster along a line in 3d space.我有一些数据点沿着 3d 空间中的一条线聚集。 I have the x,y,z data in a csv file that I want to import.我在要导入的 csv 文件中有 x、y、z 数据。 I would like to find an equation that represents that line, or the plane perpendicular to that line, or whatever is mathematically correct.我想找到一个方程来代表那条线,或者垂直于那条线的平面,或者任何数学上正确的东西。 These data are independent of each other.这些数据相互独立。 Maybe there are better ways to do this than what I tried to do but...也许有比我尝试做的更好的方法来做到这一点,但是......

I attempted to replicate an old post here that seemed to be doing exactly what I'm trying to do Fitting a line in 3D我试图在这里复制一篇旧帖子,它似乎正在做我想做的事情在 3D 中安装一条线

but it seems that maybe updates over the past decade have left the second part of the code not working?但似乎过去十年的更新可能导致代码的第二部分不起作用? Or maybe I'm just doing something wrong.或者,也许我只是做错了什么。 I've included the entire thing that I frankensteined together from this at the bottom.我已经在底部包含了我从这里开始整理的整个事情。 There are two lines that seem to be giving me a problem.有两行似乎给我带来了问题。

I've snippeted them out here...我已经把它们摘录在这里...

import numpy as np

pts = np.add.accumulate(np.random.random((10,3)))
x,y,z = pts.T

# this will find the slope and x-intercept of a plane
# parallel to the y-axis that best fits the data
A_xz = np.vstack((x, np.ones(len(x)))).T
m_xz, c_xz = np.linalg.lstsq(A_xz, z)[0]

# again for a plane parallel to the x-axis
A_yz = np.vstack((y, np.ones(len(y)))).T
m_yz, c_yz = np.linalg.lstsq(A_yz, z)[0]

# the intersection of those two planes and
# the function for the line would be:
# z = m_yz * y + c_yz
# z = m_xz * x + c_xz
# or:
def lin(z):
    x = (z - c_xz)/m_xz
    y = (z - c_yz)/m_yz
    return x,y

#verifying:
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt

fig = plt.figure()
ax = Axes3D(fig)
zz = np.linspace(0,5)
xx,yy = lin(zz)
ax.scatter(x, y, z)
ax.plot(xx,yy,zz)
plt.savefig('test.png')
plt.show()

They return this, but no values...他们返回这个,但没有值......

FutureWarning: rcond parameter will change to the default of machine precision times max(M, N) where M and N are the input matrix dimensions. FutureWarning: rcond参数将更改为机器精度乘以max(M, N)的默认值,其中 M 和 N 是输入矩阵维度。 To use the future default and silence this warning we advise to pass rcond=None , to keep using the old, explicitly pass rcond=-1 .要使用未来的默认值并消除此警告,我们建议传递rcond=None ,以继续使用旧的,显式传递rcond=-1 m_xz, c_xz = np.linalg.lstsq(A_xz, z)[0] FutureWarning: rcond parameter will change to the default of machine precision times max(M, N) where M and N are the input matrix dimensions. m_xz, c_xz = np.linalg.lstsq(A_xz, z)[0] 未来警告: rcond参数将更改为机器精度乘以max(M, N)的默认值,其中 M 和 N 是输入矩阵维度。 To use the future default and silence this warning we advise to pass rcond=None , to keep using the old, explicitly pass rcond=-1 .要使用未来的默认值并消除此警告,我们建议传递rcond=None ,以继续使用旧的,显式传递rcond=-1 m_yz, c_yz = np.linalg.lstsq(A_yz, z)[0] m_yz, c_yz = np.linalg.lstsq(A_yz, z)[0]

I don't know where to go from here.我不知道 go 从这里到哪里。 I don't even actually need the plot, I just needed an equation and am ill-equipped to move forward.我什至实际上不需要 plot,我只需要一个等式并且没有能力继续前进。 If anyone knows an easier way to do this, or can point me in the right direction, I'm willing to learn, but I'm very, very lost.如果有人知道更简单的方法,或者可以为我指明正确的方向,我愿意学习,但我非常非常迷茫。 Thank you in advance!!先感谢您!!

Here is my entire frankensteined code in case that is what is causing the issue.这是我的整个 frankensteined 代码,以防万一导致问题。

import pandas as pd
import numpy as np
mydataset = pd.read_csv('line1.csv')

x = mydataset.iloc[:,0]
y = mydataset.iloc[:,1]
z = mydataset.iloc[:,2]


data = np.concatenate((x[:, np.newaxis], 
                       y[:, np.newaxis], 
                       z[:, np.newaxis]), 
                      axis=1)


# Calculate the mean of the points, i.e. the 'center' of the cloud
datamean = data.mean(axis=0)

# Do an SVD on the mean-centered data.
uu, dd, vv = np.linalg.svd(data - datamean)

# Now vv[0] contains the first principal component, i.e. the direction
# vector of the 'best fit' line in the least squares sense.

# Now generate some points along this best fit line, for plotting.

# we want it to have mean 0 (like the points we did
# the svd on). Also, it's a straight line, so we only need 2 points.
linepts = vv[0] * np.mgrid[-100:100:2j][:, np.newaxis]

# shift by the mean to get the line in the right place
linepts += datamean

# Verify that everything looks right.

import matplotlib.pyplot as plt
import mpl_toolkits.mplot3d as m3d

ax = m3d.Axes3D(plt.figure())
ax.scatter3D(*data.T)
ax.plot3D(*linepts.T)
plt.show()

# this will find the slope and x-intercept of a plane
# parallel to the y-axis that best fits the data
A_xz = np.vstack((x, np.ones(len(x)))).T
m_xz, c_xz = np.linalg.lstsq(A_xz, z)[0]

# again for a plane parallel to the x-axis
A_yz = np.vstack((y, np.ones(len(y)))).T
m_yz, c_yz = np.linalg.lstsq(A_yz, z)[0]

# the intersection of those two planes and
# the function for the line would be:
# z = m_yz * y + c_yz
# z = m_xz * x + c_xz
# or:
def lin(z):
    x = (z - c_xz)/m_xz
    y = (z - c_yz)/m_yz
    return x,y

print(x,y)

#verifying:
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt

fig = plt.figure()
ax = Axes3D(fig)
zz = np.linspace(0,5)
xx,yy = lin(zz)
ax.scatter(x, y, z)
ax.plot(xx,yy,zz)
plt.savefig('test.png')
plt.show()

You can get rid of the complaint from leastsquares by adding rcond=None like this:您可以通过添加leastsquares rcond=None来摆脱最小二乘法的投诉,如下所示:

m_xz, c_xz = np.linalg.lstsq(A_xz, z, rcond=None)[0]

Is this the right decision for your situation?对于您的情况,这是正确的决定吗? I have no idea.我不知道。 But there's more about it in the docs .但在docs中有更多关于它的内容。

When I run your code with your inputs it seems to run just fine and I get values assigned to m_xz , c_xz , etc. If you don't call them explicitly with print('m_xz') (or whatever) then you won't see them.当我使用您的输入运行您的代码时,它似乎运行得很好,并且我得到了分配给m_xzc_xz等的值。如果您不使用print('m_xz') (或其他)显式调用它们,那么您将不会看他们。

m_xz
Out[42]: 5.186132604596112

c_xz
Out[43]: 62.5764694106141

Also, you reference your data in kind of two different ways.此外,您以两种不同的方式引用您的数据。 You get x, y, and z from your csv, but also put it into a numpy array.您从 csv 获得 x、y 和 z,但也将其放入 numpy 数组中。 You can get rid of the duplication and pandas by just using numpy:您只需使用 numpy 即可摆脱重复和 pandas:

data = np.genfromtxt('line1.csv', delimiter=',', skip_header=1)

x = data[:,0]
y = data[:,1]
z = data[:,2] 

As was proposed in the old post you refer to , you could also make use of principal component analysis instead of a least squares approach.正如您所指的旧帖子中提出的那样,您还可以使用主成分分析而不是最小二乘法。 For that I suggest sklearn.decomposition.PCA from the sklearn package .为此,我建议sklearn.decomposition.PCA来自sklearn package

An example can be found below using the csv-file you provided.使用您提供的 csv 文件可以在下面找到一个示例。

import pandas as pd
import numpy as np
from sklearn.decomposition import PCA
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

mydataset = pd.read_csv('line1.csv')

x = mydataset.iloc[:,0]
y = mydataset.iloc[:,1]
z = mydataset.iloc[:,2]

coords = np.array((x, y, z)).T

pca = PCA(n_components=1)
pca.fit(coords)
direction_vector = pca.components_
print(direction_vector)


# Create plot
origin = np.mean(coords, axis=0)
euclidian_distance = np.linalg.norm(coords - origin, axis=1)
extent = np.max(euclidian_distance)

line = np.vstack((origin - direction_vector * extent,
                  origin + direction_vector * extent))

fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(coords[:, 0], coords[:, 1], coords[:,2])
ax.plot(line[:, 0], line[:, 1], line[:, 2], 'r')

在此处输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM