如何根據 3 個因素水平制作累積指數

Question

正如標題已經暗示的那樣，我想根據以下 3 個級別計算（負）累積指數：

head(data$sentiment)
Levels:  negative neutral positive
sentiment             : Factor w/ 4 levels "","negative",..: 3 3 3 3 3

說 Negative 相當於 3，中性 2 和 1 是正面的。 分數越高，負數越多。 我打算制作一個從 0 到 100 的索引 - 100 是最負面的。 這些水平具有相同的權重，並且是特定日期的幾種情緒的累積。 最好的方法是什么？

Answer 1

一種選擇可能是：

(mean(data$sentiment, na.rm = TRUE) - 1) * 50

或者，如果您不想匯總所有值而只取當前擁有的值：

(data$sentiment - 1) * 50

這可確保您的新分數范圍為 0 到 100。

通常，您可能會尋找最小/最大標准化：

https://en.m.wikipedia.org/wiki/Feature_scaling#Rescaling_(min-max_normalization)

因此，在您的情況下，您可以從任何聚合開始，就像我建議的平均值或再次獲取原始值一樣。

x <- data$sentiment
new_x <- 0 + (x - min(x)) * (100 - 0) / (max(x) - min(x))

例子：

set.seed(1)
x <- sample(1:3, 20, replace = TRUE)
new_x <- 0 + (x - min(x)) * (100 - 0) / (max(x) - min(x))

x
 [1] 1 3 1 2 1 3 3 2 2 3 3 1 1 1 2 2 2 2 3 1

new_x
 [1]   0 100   0  50   0 100 100  50  50 100 100   0   0   0  50  50  50  50 100
[20]   0

Answer 2

為此，基礎 R 具有 function scale 。 如果x是一個數值向量，列出從a到b的分數，並且你想要從 0 到M的分數，那么你會這樣做：

scale(x, center = a, scale = (b - a) / M)

在您可以使用scale之前，您需要將您的因素sentiment強制轉換為列出等效分數的數字向量，如下所示：

set.seed(1L)
sentiment <- gl(4L, 1L, labels = c("", "negative", "neutral", "positive"))[sample(4L, size = 12L, replace = TRUE)]
sentiment
##  [1]          positive neutral           negative         
##  [7] neutral  neutral  negative negative neutral  neutral 
## Levels:  negative neutral positive
str(sentiment)
## Factor w/ 4 levels "","negative",..: 1 4 3 1 2 1 3 3 2 2 ...

scores <- c(NA, 3, 2, 1)[as.integer(sentiment)]
scores
## [1] NA  1  2 NA  3 NA  2  2  3  3  2  2

請注意，我們已為您的因素中出現的情緒""分配了一個缺失值NA 。 現在你可以這樣做：

as.double(scale(scores, center = 1, scale = (3 - 1) / 100))
## [1]  NA   0  50  NA 100  NA  50  50 100 100  50  50

在這里， as.double僅用於將scale的結果（1 列矩陣）強制轉換為向量。

如何根據 3 個因素水平制作累積指數

問題描述

2 個解決方案

解決方案1
0 2022-01-30 14:40:33

解決方案2
0 2022-01-30 19:39:55

如何根據 3 個因素水平制作累積指數

問題描述

2 個解決方案

解決方案1 0 2022-01-30 14:40:33

解決方案2 0 2022-01-30 19:39:55

解決方案1
0 2022-01-30 14:40:33

解決方案2
0 2022-01-30 19:39:55