简体   繁体   中英

Correlation between continuous y variable and 3 categoriy x variable, R

I have the company rating on the x-axis (1=sussessful, 2=not sure yet, 3=not successful) and a diversity index on the y-axis which is between 0 and 1. I want to find out if the company rating is correlating with the diversity index, answering the question "is a higher diversity index connected to a higher success of a company". I am not sure how to do this since the rating is a categorical variable and the diversity index is continuous. Please help. Thank you!

data1 <- structure(list(x = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 3L, 
3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 
4L, 4L, 4L, 3L, 3L, 3L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 4L, 4L, 4L, 4L, 2L, 2L, 2L, 2L, 3L, 3L, 2L, 2L, 2L, 3L, 
3L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("1", 
"2", "3", "4"), class = "factor"), y = c(0.66625, 0.66625, 0.66625, 
0.833125, 0.833125, 0.833125, 0.833125, 0.833125, 0.833125, 0.833125, 
0.833125, 0.833125, 0.833125, 0.833125, 0.833125, 0, 0.83375, 
0.83375, 0.83375, 0.166666666666667, 0.166666666666667, 0.166666666666667, 
0.166666666666667, 0.333333333333333, 0.333333333333333, 0, 0, 
0.25, 0.25, 0.25, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.20859375, 
0.20859375, 0.20859375, 0.20859375, 0.20859375, 0.333333333333333, 
0.333333333333333, 0.125, 0.125, 0.125, 0.5, 0.5, 0.5, 0.125, 
0.125, 0.125, 0.5, 0.5, 0.340831629175187, 0.340831629175187, 
0.340831629175187, 0.340831629175187, 0.340831629175187, 0.340831629175187, 
0.125, 0.125, 0.125, 0.33375, 0.33375, 0.33375, 0.125, 0.125, 
0.125, 0.125, 0.125, 0.125, 0.166666666666667, 0.166666666666667, 
0.65, 0.65, 0.65, 0.65, 0.5, 0.5, 0.5, 0.1, 0.1, 0.1, 0.1, 0.3, 
0.3, 0.3, 0.3, 0.166666666666667, 0.166666666666667, 0.66625, 
0.66625, 0.66625, 0, 0, 0.65, 0.65, 0.65, 0.65, 0.166666666666667, 
0.166666666666667, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7)), class = "data.frame", row.names = c(NA, -151L))

As others have suggested, one option you have is to first run an ANOVA test to see if the three groups (based on the X variable) have different averages on the Y variable. If the ANOVA is significant, then you run three "post-hoc" tests to compare the three groups in pairs, using perhaps t tests. If you want a package to quickly run and visualize group comparisons on your data I'd recommend the ggstatsplot package. This simple code would implement and visualize the comparison:

library(ggstatsplot)
ggbetweenstats(data1, x = x, y = y)

You can also use something like an ordinal regression model, though your data needs to meet certain conditions for that to be a valid approach!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM