简体   繁体   English

在R中绘制时间序列

[英]plotting time series in R

I am working with data, 1st two columns are dates, 3rd column is symbol, and 4th and 5th columns are prices. 我正在处理数据,前两列是日期,第三列是符号,第四和第五列是价格。 So, I created a subset of the data as follows: 所以,我创建了一个数据子集如下:

test.sub<-subset(test,V3=="GOOG",select=c(V1,V4)

and then I try to plot a time series chart using the following 然后我尝试使用以下方法绘制时间序列图

as.ts(test.sub)
plot(test.sub)

well, it gives me a scatter plot - not what I was looking for. 好吧,它给了我一个散点图 - 不是我想要的。 so, I tried plot(test.sub[1],test.sub[2]) and now I get the following error: 所以,我尝试了plot(test.sub[1],test.sub[2]) ,现在我收到以下错误:

Error in xy.coords(x, y, xlabel, ylabel, log) : 
  'x' and 'y' lengths differ

To make sure the no. 确保没有。 of rows were same, I ran nrow(test.sub[1]) and nrow(test.sub[2]) and they both return equal rows, so as a newcomer to R, I am not sure what the fix is. 行是相同的,我运行nrow(test.sub[1])nrow(test.sub[2])并且它们都返回相等的行,所以作为R的新手,我不确定修复是什么。

I also ran plot.ts(test.sub) and that works, but it doesn't show me the dates in the x-axis, which it was doing with plot(test.sub) and which is what I would like to see. 我也运行了plot.ts(test.sub) ,但是它没有显示x轴上的日期,它是用plot(test.sub)做的,这是我想看到的。

test.sub[1]
              V1
1107 2011-Aug-24
1206 2011-Aug-25
1307 2011-Aug-26
1408 2011-Aug-29
1510 2011-Aug-30
1613 2011-Aug-31
1718 2011-Sep-01
1823 2011-Sep-02
1929 2011-Sep-06
2035 2011-Sep-07
2143 2011-Sep-08
2251 2011-Sep-09
2359 2011-Sep-13
2470 2011-Sep-14
2581 2011-Sep-15
2692 2011-Sep-16
2785 2011-Sep-19
2869 2011-Sep-20
2965 2011-Sep-21
3062 2011-Sep-22
3160 2011-Sep-23
3258 2011-Sep-26
3356 2011-Sep-27
3455 2011-Sep-28
3555 2011-Sep-29
3655 2011-Sep-30
3755 2011-Oct-03
3856 2011-Oct-04
3957 2011-Oct-05
4059 2011-Oct-06
4164 2011-Oct-07
4269 2011-Oct-10
4374 2011-Oct-11
4479 2011-Oct-12
4584 2011-Oct-13
4689 2011-Oct-14

str(test.sub)
'data.frame':   35 obs. of  2 variables:
 $ V1:Class 'Date'  num [1:35] NA NA NA NA NA NA NA NA NA NA ...
 $ V4: num  0.475 0.452 0.423 0.418 0.403 ...

head(test.sub) V1 V4 
1212 <NA> 0.474697 
1313 <NA> 0.451907 
1414 <NA> 0.423184 
1516 <NA> 0.417709 
1620 <NA> 0.402966 
1725 <NA> 0.414264 

Now that this is working, I'd like to add a 3rd variable to plot a 3d chart - any suggestions how I can do that. 现在这个工作正常,我想添加一个第三个变量来绘制一个三维图表 - 任何建议我如何做到这一点。 thx! 谢谢!

So I think there are a few things going on here that are worth talking through: 所以我认为这里有一些值得一谈的事情:

first, some example data: 首先,一些示例数据:

test <- data.frame(End = Sys.Date()+1:5, 
               Start = Sys.Date()+0:4, 
               tck = rep("GOOG",5), 
               EndP= 1:5, 
               StartP= 0:4)

test.sub = subset(test, tck=="GOOG",select = c(End, EndP))

First, note that test and test.sub are both data frames, so calls like test.sub[1] don't really "mean" anything to R.** It's more R-ish to write test.sub[,1] by virtue of consistency with other R structures. 首先,请注意test和test.sub都是数据帧,因此像test.sub[1]这样的调用对R来说并不是真正的“意思”。**更多R-ish写test.sub[,1]凭借与其他R结构的一致性。 If you compare the results of str(test.sub[1]) and str(test.sub[,1]) you'll see that R treats them slightly differently. 如果比较str(test.sub[1])str(test.sub[,1])你会发现R对它们的处理方式略有不同。

You said you typed: 你说你输入了:

as.ts(test.sub)
plot(test.sub)

I'd guess you have extensive experience with some sort of OO-language; 我猜你在某种OO语言方面有丰富的经验; and while R does have some OO flavor to it, it doesn't apply here. 虽然R确实有一些OO味道,但它不适用于此。 Rather than transforming test.sub to something of class ts, this just does the transformation and throws it away, then moves on to plot the data frame you started with. 不是将test.sub转换为类ts的东西,而是仅进行转换并将其抛弃,然后继续绘制您开始使用的数据框。 It's an easy fix though: 这是一个简单的解决方案:

test.sub.ts <- as.ts(test.sub)
plot(test.sub.ts)

But, this probably isn't what you were looking for either. 但是,这可能不是你想要的。 Rather, R creates a time series that has two variables called "End" (which is the date now coerced to an integer) and "EndP". 相反,R创建一个时间序列,其中包含两个名为“End”的变量(现在是强制转换为整数的日期)和“EndP”。 Funny business like this is part of the reason time series packages like zoo and xts have caught on so I'll detail them instead a little further down. 像这样的有趣的业务是像动物园和xts这样的时间序列包已经流行的部分原因,所以我会详细介绍它们。

(Unfortunately, to the best of my understanding, R doesn't keep date stamps with its default ts class, choosing instead to keep start and end dates as well as a frequency. For more general time series work, this is rarely flexible enough) (不幸的是,据我所知,R不会使用其默认的ts类保留日期戳,而是选择保留开始和结束日期以及频率。对于更一般的时间序列工作,这很少是足够灵活的)

You could perhaps get what you wanted by typing 你也许可以通过打字得到你想要的东西

plot(test.sub[,1], test.sub[,2]) 

instead of 代替

plot(test.sub[1], test.sub[2])

since the former runs into trouble given that you are passing two sub-data frames instead of two vectors (even though it looks like you would be). 因为前者遇到了麻烦,因为你传递了两个子数据帧而不是两个向量(即使它看起来像你会这样)。 * *

Anyways, with xts (and similarly for zoo): 无论如何,使用xts(同样适用于动物园):

library(xts) # You may need to install this
xtemp <- xts(test.sub[,2], test.sub[,1]) # Create the xts object
plot(xtemp) 
# Dispatches a xts plot method which does all sorts of nice time series things

Hope some of this helps and sorry for the inline code that's not identified as such: still getting used to stack overflow. 希望其中一些有用并且对于未被识别的内联代码感到抱歉:仍然习惯于堆栈溢出。

Michael 迈克尔

**In reality, they access the lists that are used to structure a data frame internally, but that's more a code nuance than something worth relying on. **实际上,他们访问用于在内部构建数据框的列表,但这比代码值得依赖的代码更具细微差别。

***The nitty-gritty is that when you pass plot(test.sub[1], test.sub[2]) to R, it dispatches the method plot.data.frame which takes a single data frame and tries to interpret the second data frame as an additional plot parameter which gets misinterpreted somewhere way down the line, giving your error. *** plot(test.sub[1], test.sub[2])是当你将plot(test.sub[1], test.sub[2])传递给R时,它调度方法plot.data.frame ,它接受一个数据框并试图解释第二个数据框作为附加的绘图参数,在某个地方被误解,给出了错误。

The reason that you get the Error about different x and y lengths is immediately apparent if you do a traceback immediately upon raising the error: 如果在引发错误后立即执行回溯,则会立即显示出有关不同xy长度的错误的原因:

> plot(test.sub[1],test.sub[2])
Error in xy.coords(x, y, xlabel, ylabel, log) : 
  'x' and 'y' lengths differ
> traceback()
6: stop("'x' and 'y' lengths differ")
5: xy.coords(x, y, xlabel, ylabel, log)
4: plot.default(x1, ...)
3: plot(x1, ...)
2: plot.data.frame(test.sub[1], test.sub[2])
1: plot(test.sub[1], test.sub[2])

The problems in your call are manifold. 你的电话中的问题是多方面的。 First, as mentioned by @mweylandt test.sub[1] is a data frame with the single component, not a vector comprised of the contents of the first component of test.sub . 首先,如@mweylandt所述,test.sub test.sub[1]是一个包含单个组件的数据框,而不是由test.sub的第一个组件的内容组成的test.sub

From the traceback, we see that the plot.data.frame method was called. 从回溯中,我们看到plot.data.frame方法被调用。 R is quite happy to plot a data frame as long as it has at least two columns. R很乐意绘制数据框,只要它至少有两列。 R took you at your word and passed test.sub[1] (as a data.frame) on to plot() - test.sub[2] never gets a look in. test.sub[1] is eventually passed on to xy.coords() which correctly informs you that you have lots of rows for x but 0 rows for y because test.sub[1] only contains a single component. R接过你的话并将test.sub[1] (作为data.frame)传递给plot() - test.sub[2]永远不会看到test.sub[1]最终被传递给xy.coords()正确地通知你,你有很多行为xy test.sub[1] 0,因为test.sub[1]只包含一个组件。

It would have worked if you'd done plot(test.sub[,1], test.sub[,2], type = "l") or used the formula interface to name the variables plot(V4 ~ V1, data = test.sub, type = "l") as I show in my other Answer. 如果您完成了plot(test.sub[,1], test.sub[,2], type = "l")或使用公式界面命名变量plot(V4 ~ V1, data = test.sub, type = "l")正如我在其他答案中所示。

Surely it is easier to use the formula interface: 当然,使用公式界面更容易:

> test <- data.frame(End = Sys.Date()+1:5, 
+                Start = Sys.Date()+0:4, 
+                tck = rep("GOOG",5), 
+                EndP= 1:5, 
+                StartP= 0:4)
> 
> test.sub = subset(test, tck=="GOOG",select = c(End, EndP))
> head(test.sub)
         End EndP
1 2011-10-19    1
2 2011-10-20    2
3 2011-10-21    3
4 2011-10-22    4
5 2011-10-23    5
> plot(EndP ~ End, data = test.sub, type = "l")

I work extensively with time series type data and rarely, if ever, have any need for the "ts" class of objects. 我广泛使用时间序列类型数据,并且很少(如果有的话)需要"ts"类对象。 Packages zoo and xts are very useful, but if all you want to do is plot the data, i) get the date/time information correctly formatted/set-up as a "Date" or "POSIXt" class object, and then ii) just plot it using standard graphics and type = "l" (or type = "b" or type = "o" if you want to see the observation times). zooxts非常有用,但是如果你想要做的就是绘制数据,i)将日期/时间信息正确格式化/设置为"Date""POSIXt"类对象,然后ii)只需使用标准图形绘制它并type = "l" (或者如果要查看观察时间,请type = "b"type = "o" )。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM