簡體   English   中英

根據 R 中的另一個分類變量以不同方式標准化變量值(使用 R Base)

[英]standardize a variable values differently based on another categorical variable in R (Using R Base)

我有一個大型數據集,其中有一個連續變量“膽固醇”,用於每個參與者的兩次訪問(每個參與者有兩行:第一次訪問 = 之前和第二次訪問 = 之后)。 我想對膽固醇進行標准化,但我合並了訪問前和訪問后,這不會使我的標准化准確,因為它是使用平均值和 SD 計算的

使用 R BASE ,我如何在同一數據集中創建一個基於 Visit 標准化的新膽固醇變量(在此過程中標准化應進行兩次;一次用於之前,另一次用於之后,但 output(標准化值)將在再次遵循此 DF 的相同結構的一個變量

DF$Cholesterol<- c( 0.9861551,2.9154158, 3.9302373,2.9453085, 4.2248018,2.4789901, 0.9972635, 0.3879830, 1.1782336, 1.4065341,  1.0495609,1.2750138, 2.8515144, 0.4369885, 2.2410429, 0.7566147,       3.0395565,1.7335131, 1.9242212, 2.4539439, 2.8528908, 0.8432039,1.7002653, 2.3952744,2.6522959, 1.2178764, 2.3426695, 1.9030782,1.1708246,2.7267124)


DF$Visit< -c(Before,After,Before,After,Before,After,Before,After,Before,After,Before,After,Before,After,Before,After,Before,After,Before,After,Before,After,Before,After,Before, After,Before,After,Before,After) 


# the standardisation function I want to apply
standardise <- function(x) {return((x-min(x,na.rm = T))/sd(x,na.rm = T))}

先感謝您

讓我們制作您的數據,修復 df$visit 分配,將標准化 function 修復為均值而不是最小值,然后假設之前的每個新場合都是下一個人,pivot 為寬格式,然后改變我們的標准化前后變量:

df <- data.frame(x = rep(1, 30))
df$cholesterol<- c( 0.9861551,2.9154158, 3.9302373,2.9453085, 4.2248018,2.4789901, 0.9972635, 0.3879830, 1.1782336, 1.4065341,  1.0495609,1.2750138, 2.8515144, 0.4369885, 2.2410429, 0.7566147,       3.0395565,1.7335131, 1.9242212, 2.4539439, 2.8528908, 0.8432039,1.7002653, 2.3952744,2.6522959, 1.2178764, 2.3426695, 1.9030782,1.1708246,2.7267124)
df$visit <- rep(c("before", "after"), 15)

standardise <- function(x) {return((x-mean(x,na.rm = T))/sd(x,na.rm = T))}

df <- df %>%
  mutate(person = cumsum(visit == "before"))%>%
  pivot_wider(names_from = visit, id_cols = person, values_from = cholesterol)%>%
  mutate(before_std = standardise(before),
         after_std = standardise(after))

給出:

   person before after before_std after_std
    <int>  <dbl> <dbl>      <dbl>     <dbl>
 1      1  0.986 2.92     -1.16     1.33   
 2      2  3.93  2.95      1.63     1.36   
 3      3  4.22  2.48      1.91     0.842  
 4      4  0.997 0.388    -1.15    -1.49   
 5      5  1.18  1.41     -0.979   -0.356  
 6      6  1.05  1.28     -1.10    -0.503  
 7      7  2.85  0.437     0.609   -1.44   
 8      8  2.24  0.757     0.0300  -1.08   
 9      9  3.04  1.73      0.788    0.00940
10     10  1.92  2.45     -0.271    0.814  
11     11  2.85  0.843     0.611   -0.985  
12     12  1.70  2.40     -0.483    0.749  
13     13  2.65  1.22      0.420   -0.567  
14     14  2.34  1.90      0.126    0.199  
15     15  1.17  2.73     -0.986    1.12  

如果你真的想要在你的標准化 function 中使用 min 而不是 mean,編輯它應該很簡單。

為 BaseR 解決方案編輯,但有一個警示性的故事,可能有一個更簡潔的解決方案:

df <- data.frame(id = rep(c(seq(1, 15, 1)), each = 2))
df$cholesterol<- c( 0.9861551,2.9154158, 3.9302373,2.9453085, 4.2248018,2.4789901, 0.9972635, 0.3879830, 1.1782336, 1.4065341,  1.0495609,1.2750138, 2.8515144, 0.4369885, 2.2410429, 0.7566147,       3.0395565,1.7335131, 1.9242212, 2.4539439, 2.8528908, 0.8432039,1.7002653, 2.3952744,2.6522959, 1.2178764, 2.3426695, 1.9030782,1.1708246,2.7267124)

df$visit <- rep(c("before", "after"), 15)

df <- reshape(df, direction = "wide", idvar = "id", timevar = "visit")

standardise <- function(x) {return((x-mean(x,na.rm = T))/sd(x,na.rm = T))}

df$before_std <- round(standardise(df$cholesterol.before), 2)
df$aafter_std <- round(standardise(df$cholesterol.after), 2)

給出:

i id cholesterol.before cholesterol.after before_std after_std
1   1          0.9861551         2.9154158      -1.16      1.33
3   2          3.9302373         2.9453085       1.63      1.36
5   3          4.2248018         2.4789901       1.91      0.84
7   4          0.9972635         0.3879830      -1.15     -1.49
9   5          1.1782336         1.4065341      -0.98     -0.36
11  6          1.0495609         1.2750138      -1.10     -0.50
13  7          2.8515144         0.4369885       0.61     -1.44
15  8          2.2410429         0.7566147       0.03     -1.08
17  9          3.0395565         1.7335131       0.79      0.01
19 10          1.9242212         2.4539439      -0.27      0.81
21 11          2.8528908         0.8432039       0.61     -0.99
23 12          1.7002653         2.3952744      -0.48      0.75
25 13          2.6522959         1.2178764       0.42     -0.57
27 14          2.3426695         1.9030782       0.13      0.20
29 15          1.1708246         2.7267124      -0.99      1.12

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM