I am trying to loop over specific numeric columns from dataframe, the goal is to extract correlations and p-values using "cor.test" function.
The correlation consists in calculate the linear relationship of one categorical variable composed of 0 and 1 values against each specific numeric column.
Here's my code so far:
## data ##
names <- c("John", "Greg", "Maria", "Josh", "Emma")
categorical_column <- sample(0:1, 5, replace = TRUE)
numeric_column_1 <- sample(1:30, 5, replace = TRUE)
numeric_column_2 <- sample(1:40, 5, replace = TRUE)
sampled_df <- data.frame(names, categorical_column, numeric_column_1,
numeric_column_2)
## specific columns ##
numerical_columns <- c("numeric_column_1", "numeric_column_2")
## for-loop task ##
for(i in seq_along(numerical_columns)){
correlation_num_df <- structure(list(
variable <- numerical_columns,
correlation <- cor.test(sampled_df[numerical_columns[i]][[i]],
sampled_df[["categorical_column"]])[["estimate"]][["cor"]],
p_value <- cor.test(sampled_df[numerical_columns[i]][[i]],
sampled_df[["categorical_column"]])[["p.value"]]
),
class = "data.frame",
nrow = c(NA, -2L))
}
Console output:
Error in .subset2(x, i, exact = exact) : subscript out of bounds
How could I know the subset that is out of bounds? And how could I fix it?
We can use across
with summarise
library(dplyr)
library(broom)
out <- sampled_df %>%
summarise(across(all_of(numerical_columns),
~list(cor.test(., categorical_column) %>%
tidy %>%
select(estimate, p.value))))
unclass(out) %>%
bind_rows(.id = 'grp')
# A tibble: 2 x 3
# grp estimate p.value
# <chr> <dbl> <dbl>
#1 1 0.408 0.495
#2 2 0.343 0.572
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.