简体   繁体   English

如何在R中标准化时间序列数据?

[英]How to normalize time series data in R?

I have the matrix below. 我有下面的矩阵。 How do I divide each row with its mean? 如何将每一行除以均值?

 TAXA   1992    1993    1994     1995   
 Aba    1        0      0.01     0  
 Abr    2      0.084    0.1      3  
 Amp    7         6     4        2

I think you want either of these - 我想您想要这些-

For a data frame: 对于数据框:

cbind(df[1], df[-1] / rowMeans(df[-1]))
#   TAXA    X1992      X1993      X1994     X1995
# 1  Aba 3.960396 0.00000000 0.03960396 0.0000000
# 2  Abr 1.543210 0.06481481 0.07716049 2.3148148
# 3  Amp 1.473684 1.26315789 0.84210526 0.4210526

For a matrix: 对于矩阵:

m / rowMeans(m)
#         1992       1993       1994      1995
# Aba 3.960396 0.00000000 0.03960396 0.0000000
# Abr 1.543210 0.06481481 0.07716049 2.3148148
# Amp 1.473684 1.26315789 0.84210526 0.4210526

This finds the mean of each row then divides each row by its corresponding mean. 这将找到每一行的平均值,然后将每一行除以其相应的平均值。 The first assumes the first column in your example is actually a column, while the second assumes it is row names in a matrix. 第一个假定示例中的第一列实际上是一列,而第二个假定它是矩阵中的行名。

Data: 数据:

df <- structure(list(TAXA = structure(1:3, .Label = c("Aba", "Abr", 
"Amp"), class = "factor"), X1992 = c(1L, 2L, 7L), X1993 = c(0, 
0.084, 6), X1994 = c(0.01, 0.1, 4), X1995 = c(0L, 3L, 2L)), .Names = c("TAXA", 
"X1992", "X1993", "X1994", "X1995"), class = "data.frame", row.names = c(NA, 
-3L))

m <- structure(c(1, 2, 7, 0, 0.084, 6, 0.01, 0.1, 4, 0, 3, 2), .Dim = 3:4, .Dimnames = list(
    c("Aba", "Abr", "Amp"), c("1992", "1993", "1994", "1995"
    )))

Using the 'tidy data' approach (I copied the data from question to clipboard): 使用“整理数据”方法(我将数据从问题复制到剪贴板):

t <- read.table("clipboard", sep=" ", header=T)

library(tidyr)
library(dplyr)
t %>% 
  gather(year, value, -TAXA) %>% 
  group_by(TAXA) %>% 
  mutate(value=value / mean(value)) %>% 
  spread(year, value)

You get: 你得到:

Source: local data frame [3 x 5]

  TAXA    X1992      X1993      X1994     X1995
1  Aba 3.960396 0.00000000 0.03960396 0.0000000
2  Abr 1.543210 0.06481481 0.07716049 2.3148148
3  Amp 1.473684 1.26315789 0.84210526 0.4210526

It gathers the values from many columns into one. 它将来自许多列的值收集为一。 (They're getting the same treatment, they should be in one column.) Then it calculates the mean for each TAXA separately, and reformats the data back into the wide format. (他们得到相同的处理,应该放在一栏中。)然后,它分别计算每个TAXA的平均值,并将数据重新格式化为宽格式。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM