根據 R 中的另一個分類變量以不同方式標准化變量值（使用 R Base）

Question

我有一個大型數據集，其中有一個連續變量“膽固醇”，用於每個參與者的兩次訪問（每個參與者有兩行：第一次訪問 = 之前和第二次訪問 = 之后）。 我想對膽固醇進行標准化，但我合並了訪問前和訪問后，這不會使我的標准化准確，因為它是使用平均值和 SD 計算的

使用 R BASE ，我如何在同一數據集中創建一個基於 Visit 標准化的新膽固醇變量（在此過程中標准化應進行兩次；一次用於之前，另一次用於之后，但 output（標准化值）將在再次遵循此 DF 的相同結構的一個變量

DF$Cholesterol<- c( 0.9861551,2.9154158, 3.9302373,2.9453085, 4.2248018,2.4789901, 0.9972635, 0.3879830, 1.1782336, 1.4065341,  1.0495609,1.2750138, 2.8515144, 0.4369885, 2.2410429, 0.7566147,       3.0395565,1.7335131, 1.9242212, 2.4539439, 2.8528908, 0.8432039,1.7002653, 2.3952744,2.6522959, 1.2178764, 2.3426695, 1.9030782,1.1708246,2.7267124)


DF$Visit< -c(Before,After,Before,After,Before,After,Before,After,Before,After,Before,After,Before,After,Before,After,Before,After,Before,After,Before,After,Before,After,Before, After,Before,After,Before,After) 


# the standardisation function I want to apply
standardise <- function(x) {return((x-min(x,na.rm = T))/sd(x,na.rm = T))}

先感謝您

Answer 1

讓我們制作您的數據，修復 df$visit 分配，將標准化 function 修復為均值而不是最小值，然后假設之前的每個新場合都是下一個人，pivot 為寬格式，然后改變我們的標准化前后變量：

df <- data.frame(x = rep(1, 30))
df$cholesterol<- c( 0.9861551,2.9154158, 3.9302373,2.9453085, 4.2248018,2.4789901, 0.9972635, 0.3879830, 1.1782336, 1.4065341,  1.0495609,1.2750138, 2.8515144, 0.4369885, 2.2410429, 0.7566147,       3.0395565,1.7335131, 1.9242212, 2.4539439, 2.8528908, 0.8432039,1.7002653, 2.3952744,2.6522959, 1.2178764, 2.3426695, 1.9030782,1.1708246,2.7267124)
df$visit <- rep(c("before", "after"), 15)

standardise <- function(x) {return((x-mean(x,na.rm = T))/sd(x,na.rm = T))}

df <- df %>%
  mutate(person = cumsum(visit == "before"))%>%
  pivot_wider(names_from = visit, id_cols = person, values_from = cholesterol)%>%
  mutate(before_std = standardise(before),
         after_std = standardise(after))

給出：

   person before after before_std after_std
    <int>  <dbl> <dbl>      <dbl>     <dbl>
 1      1  0.986 2.92     -1.16     1.33   
 2      2  3.93  2.95      1.63     1.36   
 3      3  4.22  2.48      1.91     0.842  
 4      4  0.997 0.388    -1.15    -1.49   
 5      5  1.18  1.41     -0.979   -0.356  
 6      6  1.05  1.28     -1.10    -0.503  
 7      7  2.85  0.437     0.609   -1.44   
 8      8  2.24  0.757     0.0300  -1.08   
 9      9  3.04  1.73      0.788    0.00940
10     10  1.92  2.45     -0.271    0.814  
11     11  2.85  0.843     0.611   -0.985  
12     12  1.70  2.40     -0.483    0.749  
13     13  2.65  1.22      0.420   -0.567  
14     14  2.34  1.90      0.126    0.199  
15     15  1.17  2.73     -0.986    1.12

如果你真的想要在你的標准化 function 中使用 min 而不是 mean，編輯它應該很簡單。

為 BaseR 解決方案編輯，但有一個警示性的故事，可能有一個更簡潔的解決方案：

df <- data.frame(id = rep(c(seq(1, 15, 1)), each = 2))
df$cholesterol<- c( 0.9861551,2.9154158, 3.9302373,2.9453085, 4.2248018,2.4789901, 0.9972635, 0.3879830, 1.1782336, 1.4065341,  1.0495609,1.2750138, 2.8515144, 0.4369885, 2.2410429, 0.7566147,       3.0395565,1.7335131, 1.9242212, 2.4539439, 2.8528908, 0.8432039,1.7002653, 2.3952744,2.6522959, 1.2178764, 2.3426695, 1.9030782,1.1708246,2.7267124)

df$visit <- rep(c("before", "after"), 15)

df <- reshape(df, direction = "wide", idvar = "id", timevar = "visit")

standardise <- function(x) {return((x-mean(x,na.rm = T))/sd(x,na.rm = T))}

df$before_std <- round(standardise(df$cholesterol.before), 2)
df$aafter_std <- round(standardise(df$cholesterol.after), 2)

給出：

i id cholesterol.before cholesterol.after before_std after_std
1   1          0.9861551         2.9154158      -1.16      1.33
3   2          3.9302373         2.9453085       1.63      1.36
5   3          4.2248018         2.4789901       1.91      0.84
7   4          0.9972635         0.3879830      -1.15     -1.49
9   5          1.1782336         1.4065341      -0.98     -0.36
11  6          1.0495609         1.2750138      -1.10     -0.50
13  7          2.8515144         0.4369885       0.61     -1.44
15  8          2.2410429         0.7566147       0.03     -1.08
17  9          3.0395565         1.7335131       0.79      0.01
19 10          1.9242212         2.4539439      -0.27      0.81
21 11          2.8528908         0.8432039       0.61     -0.99
23 12          1.7002653         2.3952744      -0.48      0.75
25 13          2.6522959         1.2178764       0.42     -0.57
27 14          2.3426695         1.9030782       0.13      0.20
29 15          1.1708246         2.7267124      -0.99      1.12

根據 R 中的另一個分類變量以不同方式標准化變量值（使用 R Base）

問題描述

1 個解決方案

解決方案1
1 已采納 2022-04-20 19:17:09

根據 R 中的另一個分類變量以不同方式標准化變量值（使用 R Base）

問題描述

1 個解決方案

解決方案1 1 已采納 2022-04-20 19:17:09

解決方案1
1 已采納 2022-04-20 19:17:09