Year.Sales.Advertise.Employees
1 1985 1.05 162 32
2 1986 1.26 285 47
3 1987 1.47 540 23
4 1988 2.16 261 68
5 1989 1.95 360 32
6 1990 2.4 690 17
7 1991 2.37 495 58
8 1992 3.15 948 75
9 1993 3.57 720 98
10 1994 4.41 1.14 43
11 1995 4.5 1.395 76
12 1996 5.61 1.56 89
13 1997 5.19 1.38 108
14 1998 5.67 1.26 76
15 1999 5.16 1.71 65
16 2000 6.84 1.86 93
I want to find the Spearman correlation between Sales and Advertise and ive been stuck for 3 hours please help. I think I have to separate the 1 variable into 5 variables but Im struggling.
We can use strsplit
to split our data, ie
new_df <- setNames(data.frame(do.call(rbind, strsplit(df2$Year.Sales.Advertise.Employees, ' '))),
strsplit(names(df2), '.', fixed = TRUE)[[1]])
which gives,
Year Sales Advertise Employees 1 1985 1.05 162 32 2 1986 1.26 285 47 3 1987 1.47 540 23 4 1988 2.16 261 68 5 1989 1.95 360 32 6 1990 2.4 690 17 7 1991 2.37 495 58 8 1992 3.15 948 75 9 1993 3.57 720 98 10 1994 4.41 1.14 43 11 1995 4.5 1.395 76 12 1996 5.61 1.56 89 13 1997 5.19 1.38 108 14 1998 5.67 1.26 76 15 1999 5.16 1.71 65 16 2000 6.84 1.86 93
You can then use cor
(ie cor(new_df$Advertise, new_df$Employees)
) to find correlations between any columns you want.
NOTE1: Make sure that your initial column is a character (not factor)
NOTE2: By default, cor
function calculates the pearson correlation. For spearman, add the argument cor(..., method = "spearman")
, as mentioned by @Base_R_Best_R.
DATA
dput(df2)
structure(list(Year.Sales.Advertise.Employees = c("1985 1.05 162 32",
"1986 1.26 285 47", "1987 1.47 540 23", "1988 2.16 261 68", "1989 1.95 360 32",
"1990 2.4 690 17", "1991 2.37 495 58", "1992 3.15 948 75", "1993 3.57 720 98",
"1994 4.41 1.14 43", "1995 4.5 1.395 76", "1996 5.61 1.56 89",
"1997 5.19 1.38 108", "1998 5.67 1.26 76", "1999 5.16 1.71 65",
"2000 6.84 1.86 93")), class = "data.frame", row.names = c(NA,
-16L))
Not sure if you are looking for something like below or other things
# split strings into separate columns
df <- `names<-`(data.frame(t(apply(df, 1, function(x) as.numeric(unlist(strsplit(x,split = " ")))))),
unlist(strsplit(names(df),split = "\\.")))
# calculate correction coefficient
r <- cor(df$Sales,df$Advertise)
such that
> r
[1] -0.5624524
DATA
df <- structure(list(Year.Sales.Advertise.Employees = c("1985 1.05 162 32",
"1986 1.26 285 47", "1987 1.47 540 23", "1988 2.16 261 68", "1989 1.95 360 32",
"1990 2.4 690 17", "1991 2.37 495 58", "1992 3.15 948 75", "1993 3.57 720 98",
"1994 4.41 1.14 43", "1995 4.5 1.395 76", "1996 5.61 1.56 89",
"1997 5.19 1.38 108", "1998 5.67 1.26 76", "1999 5.16 1.71 65",
"2000 6.84 1.86 93")), class = "data.frame", row.names = c(NA,
-16L))
> df
Year.Sales.Advertise.Employees
1 1985 1.05 162 32
2 1986 1.26 285 47
3 1987 1.47 540 23
4 1988 2.16 261 68
5 1989 1.95 360 32
6 1990 2.4 690 17
7 1991 2.37 495 58
8 1992 3.15 948 75
9 1993 3.57 720 98
10 1994 4.41 1.14 43
11 1995 4.5 1.395 76
12 1996 5.61 1.56 89
13 1997 5.19 1.38 108
14 1998 5.67 1.26 76
15 1999 5.16 1.71 65
16 2000 6.84 1.86 93
If you're asking for the data to be split into 4 discrete columns, this should do it.
Your data in the question needed some cleaning. It probably needs more (manual) cleaning, as advertise falls from 720 to 1.14 between 1993 and 1994. That's likely from hundreds of thousands to millions.
x <- c("1985 1.05 162 32",
"1986 1.26 285 47",
"1987 1.47 540 23",
"1988 2.16 261 68",
"1989 1.95 360 32",
"1990 2.4 690 17",
"1991 2.37 495 58",
"1992 3.15 948 75",
"1993 3.57 720 98",
"1994 4.41 1.14 43",
"1995 4.5 1.395 76",
"1996 5.61 1.56 89",
"1997 5.19 1.38 108",
"1998 5.67 1.26 76",
"1999 5.16 1.71 65",
"2000 6.84 1.86 93")
library(tidyverse)
clean_df <- x %>%
as.data.frame() %>%
separate('.',
into = c('year','sales', 'advertise', 'empl'),
sep = ' ') %>%
as_tibble() %>%
mutate_all(as.numeric)
cor(clean_df$sales, clean_df$advertise, method = 'spearman')
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.