繁体   English   中英

如何在散点图上分离数据集

[英]How do I separate a data set on a scatter plot

我是python的新手,但是对学习一种新技术很感兴趣,借此我可以根据散点图中落在不同位置的标记来识别散点图中的不同数据点。

我的具体示例与此有关: http : //www.astroml.org/examples/datasets/plot_sdss_line_ratios.html

我有一个BPT图,想沿着分界线分割数据。

我有一个采用以下格式的数据集:

data = [[a,b,c],
        [a,b,c],
        [a,b,c]
]

对于分界线,我还有以下内容:

NII   = np.linspace(-3.0, 0.35)

def log_OIII_Hb_NII(log_NII_Ha, eps=0):
    return 1.19 + eps + 0.61 / (log_NII_Ha - eps - 0.47)

任何帮助将是巨大的!

评论部分空间不足。 与@DrV编写的内容并不太相似,但可能更倾向于天文学:

import random
import numpy as np
import matplotlib.pyplot as plt

def log_OIII_Hb_NII(log_NII_Ha, eps=0):
    return 1.19 + eps + 0.61 / (log_NII_Ha - eps - 0.47)

# Make some fake measured NII_Ha data
iternum = 100

# Ranged -2.1 to 0.4:
Measured_NII_Ha = np.array([random.random()*2.5-2.1 for i in range(iternum)])
# Ranged -1.5 to 1.5:
Measured_OIII_Hb = np.array([random.random()*3-1.5 for i in range(iternum)])

# For our measured x-value, what is our cut-off value
Measured_Predicted_OIII_Hb = log_OIII_Hb_NII(Measured_NII_Ha)

# Now compare the cut-off line to the measured emission line fluxes
# by using numpy True/False arrays
#
# i.e., x = numpy.array([1,2,3,4])
# >> index = x >= 3
# >> print(index)
# >> numpy.array([False, False, True, True])
# >> print(x[index])
# >> numpy.array([3,4])

Above_Predicted_Red_Index = Measured_OIII_Hb > Measured_Predicted_OIII_Hb
Below_Predicted_Blue_Index = Measured_OIII_Hb < Measured_Predicted_OIII_Hb
# Alternatively, you can invert Above_Predicted_Red_Index



# Make the cut-off line for a range of values for plotting it as
# a continuous line
Predicted_NII_Ha = np.linspace(-3.0, 0.35)
Predicted_log_OIII_Hb_NII = log_OIII_Hb_NII(Predicted_NII_Ha)

fig = plt.figure(0)
ax = fig.add_subplot(111)

# Plot the modelled cut-off line
ax.plot(Predicted_NII_Ha, Predicted_log_OIII_Hb_NII, color="black", lw=2)

# Plot the data for a given colour
ax.errorbar(Measured_NII_Ha[Above_Predicted_Red_Index], Measured_OIII_Hb[Above_Predicted_Red_Index], fmt="o", color="red")
ax.errorbar(Measured_NII_Ha[Below_Predicted_Blue_Index], Measured_OIII_Hb[Below_Predicted_Blue_Index], fmt="o", color="blue")

# Make it aesthetically pleasing
ax.set_ylabel(r"$\rm \log([OIII]/H\beta)$")
ax.set_xlabel(r"$\rm \log([NII]/H\alpha)$")

plt.show()

示例代码图

我假设您的示例中像素坐标为a, b 然后,带有c s的列将用于计算点是否属于两个组之一。

首先将数据设为ndarray

import numpy as np

data = np.array(data)

现在,您可以通过检查数据的哪一部分属于哪个区域来创建两个数组:

dataselector = log_OIII_Hb_NII(data[:,2]) > 0

这将创建一个True和Falses向量,每当第三列(第2列)中的数据从该函数提供正值时,该向量都为True。 向量的长度等于data的行数。

然后可以绘制两个数据集:

import matplotlib.pyplot as plt

fig = plt.figure()
ax = fig.add_subplot(111)

# the plotting part
ax.plot(data[dataselector,0], data[dataselector,1], 'ro')
ax.plot(data[-dataselector,0], data[-dataselector,1], 'bo')

即:

  • 创建一个True / False值列表,该值指示哪些data行属于哪个组
  • 绘制两组( -dataselector表示“ dataselector所有为False的所有行”)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM