简体   繁体   中英

Coloring the points by category in R

I am creating a scatter plot in R using the following code:

plot(df_prob1$x1, df_prob1$x2, pch = df_prob1$y)

I get the following plot:

在此处输入图像描述

As seen in the above plot there are two categories, one represented by a square and the other by circle. I want these two categories to have different colors as well.

I did try using the following code:

plot(df_prob1$x1, df_prob1$x2, pch = df_prob1$y, col = c("red", "blue"))

And I get the following plot:

在此处输入图像描述

However, it is randomly coloring points and not taking the categories into consideration.

I also did trying passing the variable as value for col as such:

plot(df_prob1$x1, df_prob1$x2, pch = df_prob1$y, col = df_prob1$y)

But this didn't give a proper plot.

You can use ggplot library for this:

library(ggplot) #install it if you dont have

ggplot(df_prob1,aes(x1,x2))+geom_point(aes(color = factor(y), shape = factor(y))) 

The trick is to use df_prob1$y as an index to the colors vector, c("red", "blue") . This can easily be done if the column y is coerced to a factor, since factors are coded internally as consecutive integers starting at 1. The following code uses the built-in data set iris , processed at the end of this answer.

clrs <- c("red", "blue")[factor(df_prob1$y)]
plot(df_prob1$x1, df_prob1$x2, pch = df_prob1$y, col = clrs)

在此处输入图像描述

Test data.

set.seed(1234)
df_prob1 <- subset(iris[c(1, 2, 5)], Species != "virginica")
df_prob1 <- df_prob1[sample(nrow(df_prob1), 50), ]
df_prob1[[3]] <- as.numeric(df_prob1[[3]] == "setosa")
names(df_prob1) <- c("x1", "x2", "y")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM