简体   繁体   English

R 初学者问题数据帧转换为时间序列

[英]R Beginner Question Data Frame Conversion to Time Series

I have a flat-csv file containing data in long format, that needs to be converted to a time series object.我有一个包含长格式数据的平面 csv 文件,需要将其转换为时间序列对象。 The format of the file looks like this:该文件的格式如下所示:

DATE       ID  REGION VALUE
2016-03-10 10  DE001  2332,23
2016-03-10 10  DE001  2332,23
2016-03-10 10  DE002  2332,23
2016-03-10 11  DE001  2332,23
2016-03-10 11  DE002  2332,23
2016-03-10 12  DE001  2332,23
2016-03-11 10  DE001  2332,23
2016-03-11 10  DE001  2332,23
2016-03-11 10  DE002  2332,23
2016-03-11 11  DE001  2332,23
2016-03-11 11  DE002  2332,23
2016-03-11 12  DE001  2332,23

I want to group by ID and then by region, so that i have a different time-series for each ID-group containing several region observations for the complete available time-span.我想按 ID 分组,然后按区域分组,以便每个 ID 组都有不同的时间序列,其中包含完整可用时间跨度的多个区域观测值。

I misunderstood the OP's question.我误解了 OP 的问题。

You can use tapply to break up the original data frame (call it D).您可以使用 tapply 分解原始数据框(称为 D)。 This is a bit tricky.这有点棘手。 You can't easily change D in the tapply你不能轻易改变 Tapply 中的 D

D$relTime <- NA 

L=tapply(1:nrow(D),D$ID, function(x) {
    # x contains the row numbers for each ID
    RT <- data.frame(row=x)
    T0 <- D$DATE[x][1]
    RT$val <- D$DATE[x]-T0 # if time series means offset from a base time
    RT
})
DL <- do.call('rbind',L)
# assuming you want it in  D
D$relTime[DL$row] <- DL$val

This will create a new column which contains the offset from the base time for each ID.这将创建一个新列,其中包含每个 ID 与基准时间的偏移量。

Edit: I use '=' for assingment which isn't considered best practice.编辑:我使用 '=' 进行评估,这不被认为是最佳实践。 I've changed them in the above.我已经在上面改变了它们。

You can use the as.Date function.您可以使用 as.Date 函数。 Load the table using read.table("filename.csv").使用 read.table("filename.csv") 加载表格。 The dates will be loaded as factors unless you specify stringsAsFactors=FALSE in the read.table call.除非您在 read.table 调用中指定 stringsAsFactors=FALSE,否则日期将作为因子加载。 However, doing so will apply to all character columns.但是,这样做将适用于所有字符列。

so,所以,

D <- read.table("file.csv")
D$DATE <- as.Date(as.character(D$DATE), "%Y-%m-%d")

should do the trick.应该做的伎俩。 The as.character will ensure that the dates are passed as strings to as.Date even if they have been loaded as factors as.character 将确保日期作为字符串传递给 as.Date 即使它们已作为因子加载

More info:更多信息:

https://stat.ethz.ch/R-manual/R-devel/library/base/html/as.Date.html https://stat.ethz.ch/R-manual/R-devel/library/base/html/as.Date.html

https://stat.ethz.ch/R-manual/R-devel/library/utils/html/read.table.html https://stat.ethz.ch/R-manual/R-devel/library/utils/html/read.table.html

Technically, it's not a csv file because the "c" in "csv" means "comma".从技术上讲,它不是 csv 文件,因为“csv”中的“c”表示“逗号”。 Your separators are spaces.你的分隔符是空格。 But you can still use the read.csv call if you specify sep=' ' in the call.但是,如果您在调用中指定 sep=' ',您仍然可以使用 read.csv 调用。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM