简体   繁体   English

R或python中的双色散点图

[英]Two colour scatter plot in R or in python

I have a dataset of three columns and n number of rows. 我有一个包含三列和n行的数据集。 column 1 contains name, column 2 value1, and column 3 value2 (rank2). 第1列包含name,第2列value1和第3列value2(rank2)。

I want to plot a scatter plot with the outlier values displaying names. 我想用显示名称的异常值绘制散点图。

The R commands I am using in are: 我使用的R命令是:

tiff('scatterplot.tiff')
data<-read.table("scatterplot_data", header=T)
attach(data)
reg1<-lm(A~B)
plot(A,B,col="red")
abline(reg1)
outliers<-data[which(2^(data[,2]-data[,3]) >= 4 | 2^(data[,2]-data[,3]) <=0.25),]

text(outliers[,2], outliers[,3],labels=outliers[,1],cex=0.50)

dev.off()

and I get a figure like this: 我得到一个这样的数字: 在此输入图像描述

What I want is the labels on the lower half should be of one colour and the labels in upper half should be of another colour say green and red respectively. 我想要的是下半部分的标签应该是一种颜色,上半部分的标签应该是另一种颜色分别是绿色和红色。

Any suggestions, or adjustment in the commands? 任何建议,或调整命令?

You already have a logical test that works to your satisfaction. 您已经有一个令您满意的逻辑测试。 Just use it in the color spec to text: 只需在颜色规范中使用它来发短信:

     text(outliers[,2], outliers[,3],labels=outliers[,1],cex=0.50, 
         col=c("blue", "green")[ 
                which(2^(data[,2]-data[,3]) >= 4 ,  2^(data[,2]-data[,3]) <=0.25)] )

It's untested of course because you offered no test case, but my reasoning is that the which() function should return 1 for the differences >= 4, and 2 for the ones <= 0.25, and integer(0) for all the others and that this should give you the proper alignment of color choices with the 'outliers' vector. 它当然没有经过测试,因为你没有提供测试用例,但我的理由是,对于差异> = 4, which()函数应该返回1,对于那些<= 0.25,则返回2,对于所有其他的,返回整数(0)这应该为您提供颜色选择与'异常值'向量的正确对齐。

Using python, matplotlib (pylab) to plot, and scipy , numpy to fit data. 使用python, matplotlib (pylab)绘制, scipynumpy以适应数据。 The trick with numpy is to create a index or mask to filter out the results that you want. numpy的技巧是创建一个索引或掩码来过滤掉你想要的结果。

EDIT : Want to selectively color the top and bottom outliers? 编辑 :想要有选择地为顶部和底部异常值着色? It's a simple combination of both masks that we created: 这是我们创建的两个面具的简单组合:

import scipy as sci
import numpy as np
import pylab as plt

# Create some data
N = 1000
X = np.random.normal(5,1,size=N)
Y = X + np.random.normal(0,5.5,size=N)/np.random.normal(5,.1)
NAMES = ["foo"]*1000 # Customize names here

# Fit a polynomial
(a,b)=sci.polyfit(X,Y,1)

# Find all points above the line
idx = (X*a + b) < Y

# Scatter according to that index
plt.scatter(X[idx],Y[idx], color='r')
plt.scatter(X[~idx],Y[~idx], color='g')

# Find top 10 outliers
err = ((X*a+b) - Y) ** 2
idx_L = np.argsort(err)[-10:]
for i in idx_L:
    plt.text(X[i], Y[i], NAMES[i])

# Color the outliers purple or black
top = idx_L[idx[idx_L]]
bot = idx_L[~idx[idx_L]]

plt.scatter(X[top],Y[top], color='purple')
plt.scatter(X[bot],Y[bot], color='black')

XF = np.linspace(0,10,1000)
plt.plot(XF, XF*a + b, 'k--') 
plt.axis('tight')
plt.show()

在此输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM