相關矩陣和分類變量

Question

我有數據框df ，其中顯示了前幾行

age region    graduate salary
19  "North"   "no"     21000
25  "South"   "yes"    24000
23  "Center"  "yes"    23000
30  "South"   "no"     25000

其中region可以是“North”、“Center”或“South”，而graduate可以是“yes”或“no”。 我的目標是執行以下分析

library("corrplot")

df <- data.frame(age=c(19,25,23,30), region=c("North","South","Center","South"), graduate=c("no","yes","yes","no"), salary=c(21000,24000,23000,25000))
corrplot(cor(df), method='number')

但我收到以下錯誤： Error in cor(df) : 'x' must be numeric 。

這里的解決方案是什么？ 我是否必須將原始數據庫轉換為以下內容

age region-North region-Center region-South graduate-yes graduate-no salary
19       1            0             0            0           1       21000
25       0            0             1            1           0       24000
23       0            1             0            1           0       23000
30       0            0             1            0           1       25000

然后重新啟動代碼？ 或者我可以直接在corrplot方法中corrplot嗎？ 目標是了解哪些變量對salary的影響最大。

Answer 1

相關只能在數值變量之間進行。
但在字符變量的情況下，以下可用於查找相關性

相關矩陣和分類變量

問題描述

1 個解決方案

解決方案1
0 2021-10-17 13:08:40

相關矩陣和分類變量

問題描述

1 個解決方案

解決方案1 0 2021-10-17 13:08:40

解決方案1
0 2021-10-17 13:08:40