[英]Right way to convert levels of column in a dataframe to numeric values in R?
我有一個數據框,其中的列包含級別“優秀,非常好,好,一般,差”。 我想對這些值求平均值,並通過將值5分配給“優秀”,將值4分配給“非常好”,依此類推,以其他方式使用它們。
我的種種嘗試都因以下事實而混淆:默認的數值分配似乎按字母順序排列級別,因此“ Excellent”為1,“ Fair”為2,依此類推。
謝謝您的幫助。
我將使用命名向量作為查找表:
options = c('Excellent' = 5, 'Very Good' = 4, 'Good' = 3, 'Fair' = 2, 'Poor' = 1)
df = data.frame(grade = sample(names(options), 100, replace = TRUE))
head(df)
grade
1 Very Good
2 Good
3 Excellent
4 Very Good
5 Fair
6 Good
df = within(df, {
grade_numeric = options[grade]
})
head(df)
grade grade_numeric
1 Very Good 1
2 Good 3
3 Excellent 5
4 Very Good 1
5 Fair 4
6 Good 3
您是否需要將其作為有序因素? 如果是這樣,使用factor
也許是最好的選擇。
樣本數據
column <- c("Excellent", "Very Good", "Good", "Fair", "Poor",
"Good", "Fair", "Poor")
col.f <- factor(column,
levels = c("Poor","Fair" , "Good" , "Very Good", "Excellent"),
labels = c("Poor","Fair" , "Good" , "Very Good", "Excellent"),
ordered = TRUE)
col.f
[1] Excellent Very Good Good Fair Poor Good Fair Poor
Levels: Poor < Fair < Good < Very Good < Excellent
然后,您可以調用as.numeric(col.f)
來獲取數字值。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.