简体   繁体   English

R:按因子级别查找数据框中的第一个和最后一个值

[英]R: Find the first and last value in a dataframe by factor level

I need your help in finding the first and last value for each factor level. 在寻找每个因子水平的第一个和最后一个值时,我需要您的帮助。

I have tick data (trade by trade) on a stock and I would like to have the open, high, low,close, total volume, and total traded for each day. 我有股票的报价数据(按交易进行交易),我希望每天都有开盘价,最高价,最低价,收盘价,总交易量和总交易量。 I am treating each day as a factor. 我每天都在考虑因素。 I have written everything except for the first and last value for each day. 除了每天的第一个和最后一个值,我已经写了所有东西。

Could you also tell me how I can write the code better since I just started learning R and I want to adopt good habits early on? 您还能告诉我自从我刚开始学习R并且想早日养成良好的习惯后,如何才能更好地编写代码?

I have included my code, the link to the dataset via google doc, and a smaller version of the dataset after my code in case the google doc is not available. 我已经包含了我的代码,通过google doc指向数据集的链接,以及在我的代码之后提供了一个较小版本的数据集,以防google doc不可用。

Thank you for your help. 谢谢您的帮助。

https://docs.google.com/document/d/1OYRfAvuKvCwndJVffnljPM74kHY1kKEtAbVdJHyoNdY/edit?usp=sharing https://docs.google.com/document/d/1OYRfAvuKvCwndJVffnljPM74kHY1kKEtAbVdJHyoNdY/edit?usp=sharing

Here is my code: 这是我的代码:

#load data
data1<-read.table("EKSO.txt",header=T,sep=",",stringsAsFactors=T)

#calculate total traded
data1["TT"]<-data1$Price*data1$Size

#find the lowest value for each day
min_l<-tapply(data1$Price,data1$Date,min)

#find the highest value for each day
max_l<-tapply(data1$Price,data1$Date,max)

#find the total volume for each day
tv_l<-tapply(data1$Size,data1$Date,sum)

#find the total traded for each day
tt_l<-tapply(data1$TT,data1$Date,sum)

#find the first price for the day

#find the last price for the day

#construct a dataframe with the datae, the open, the high, low,close, total volume, 
# and total traded
data2<-data.frame(max_l,min_l,tv_l,tt_l)

Here is the dataset: 这是数据集:

Date,Time,Price,Size
02/07/2014,09:30:01,3,500
02/07/2014,09:30:29,3,42
02/07/2014,09:35:56,3,100
02/07/2014,09:37:17,3,100
02/07/2014,09:37:28,3.2,900
02/07/2014,09:37:35,3.2,4900
02/07/2014,09:37:51,3.2,1000 
02/07/2014,09:42:11,3.2,500
02/07/2014,10:00:31,3,2400
02/07/2014,10:00:37,3.2,500
02/07/2014,10:00:44,3.2,3347
02/07/2014,10:07:33,3.2,1000
02/07/2014,10:31:42,3.24,1000
02/07/2014,10:33:44,3.24,200
02/07/2014,10:40:28,3.25,300
02/07/2014,10:49:57,3.25,600
02/07/2014,10:53:16,3.25,100
02/07/2014,10:53:32,3.4,1000
02/07/2014,10:54:13,3.4,500
02/07/2014,11:05:37,3.35,1000
02/07/2014,11:11:29,3.25,600
02/07/2014,11:15:26,3.3,60
02/07/2014,11:19:16,3.3,23
02/07/2014,11:21:14,3.25,100
02/07/2014,11:21:22,3.25,100
02/07/2014,11:21:30,3.2,500
02/07/2014,11:21:35,3.2,500
02/07/2014,11:21:43,3.2,500
02/07/2014,11:29:58,3.1,200
02/07/2014,11:35:42,3.19,360
02/07/2014,11:39:51,3.19,1000
02/07/2014,11:52:39,3.15,200
02/07/2014,11:53:51,3.15,100
02/07/2014,11:55:11,3.2,100
02/07/2014,12:17:32,3.2,1500
02/07/2014,12:35:42,3.24,1200
02/07/2014,12:37:53,3.24,100
02/07/2014,12:38:02,3.24,3500
02/07/2014,12:53:57,3.24,400
02/07/2014,13:10:57,3.239,100
02/07/2014,13:11:35,3.24,800
02/07/2014,13:13:41,3.24,1000
02/07/2014,13:39:40,3.24,450
02/07/2014,13:56:04,3.24,500
02/07/2014,14:09:49,3.24,600
02/07/2014,14:11:25,3.24,1000
02/07/2014,14:25:53,3.24,25
02/07/2014,14:30:58,3.24,30
02/07/2014,14:31:36,3.24,30
02/07/2014,14:32:12,3.24,30
02/07/2014,14:53:13,3.23,240
02/07/2014,14:53:27,3.24,500
02/07/2014,14:53:59,3.24,60
02/07/2014,14:54:46,3.2,1500
02/07/2014,15:23:09,3.19,2000
02/07/2014,15:35:23,3.18,1500
02/07/2014,15:44:36,3.18,600
02/10/2014,09:30:02,3.25,100
02/10/2014,09:30:02,3.25,25
02/10/2014,09:30:24,3.25,150
02/10/2014,09:30:40,3.25,100
02/10/2014,09:31:11,3.25,650
02/10/2014,09:35:32,3.24,200
02/10/2014,09:37:59,3.19,100
02/10/2014,09:38:01,3.2,2000
02/10/2014,09:41:24,3.15,100
02/10/2014,09:42:28,3.15,1000
02/10/2014,09:42:28,3.15,1000
02/10/2014,09:42:41,3.15,500
02/10/2014,09:42:57,3.15,100
02/10/2014,09:47:46,2.9,100
02/10/2014,09:48:24,2.9,500
02/10/2014,09:50:09,2.65,2500
02/10/2014,09:50:44,2.66,2500
02/10/2014,09:50:49,2.6,100
02/10/2014,10:21:20,2.85,300
02/10/2014,10:32:40,2.94,100
02/10/2014,10:33:18,2.95,426
02/10/2014,10:33:38,2.95,70
02/10/2014,10:57:25,2.95,500
02/10/2014,10:57:40,2.95,500
02/10/2014,11:38:29,3,500
02/10/2014,11:38:35,3.05,500
02/10/2014,13:57:20,3.1,150
02/10/2014,13:57:34,3,42
02/10/2014,14:21:42,3.15,500
02/10/2014,14:23:35,3.15,1000
02/10/2014,14:52:15,2.99,25
02/10/2014,14:52:17,2.95,100
02/10/2014,15:04:08,2.99,412
02/10/2014,15:11:42,2.99,100
02/10/2014,15:11:46,2.99,100
02/10/2014,15:12:06,2.99,100
02/10/2014,15:20:35,3.04,500
02/10/2014,15:30:28,3,500
02/10/2014,15:36:58,2.95,2000 
02/10/2014,15:38:09,3,550
02/10/2014,15:39:48,2.97,2000
02/11/2014,09:30:04,3.2,100
02/11/2014,09:30:18,3.2,2000
02/11/2014,10:03:07,3.18,1000
02/11/2014,10:21:35,3.18,26
02/11/2014,10:27:09,3.15,500
02/11/2014,10:37:22,3.15,1108
02/11/2014,10:37:22,3.15,1054
02/11/2014,10:52:17,3.01,1000
02/11/2014,10:53:55,3.01,500
02/11/2014,10:54:31,3.05,40
02/11/2014,10:55:41,3.01,100
02/11/2014,10:55:44,3,3300
02/11/2014,10:55:44,3,100
02/11/2014,15:25:01,3,1000
02/11/2014,15:49:37,3,500
02/11/2014,15:51:08,2.98,300
02/12/2014,08:46:23,3,1500
02/12/2014,09:10:01,3,2000
02/12/2014,09:21:31,3.1,1500
02/12/2014,09:26:33,3.2,2000
02/12/2014,09:27:58,3.2,2500
02/12/2014,09:30:18,3.2,30
02/12/2014,09:40:51,3.05,100
02/12/2014,09:44:31,2.98,2900
02/12/2014,09:47:43,2.98,110
02/12/2014,09:50:49,2.96,100
02/12/2014,09:50:51,2.8,750
02/12/2014,12:01:34,2.86,1500
02/12/2014,12:01:45,2.85,1500
02/12/2014,12:12:42,2.86,1500
02/12/2014,15:39:15,3,200
02/12/2014,15:48:51,3,100
02/12/2014,15:48:53,3,500

Here, ?duplicated is your best friend. 在这里, ?duplicated是您最好的朋友。

For the first price in each day use: 对于每天的第一个价格,请使用:

data1[!duplicated(data1$Date, fromLast=FALSE), "Price"]

For the last price: 最后价格:

data1[!duplicated(data1$Date, fromLast=TRUE), "Price"]

This code assumes that your data.frame is sorted according to Date and Time (see ?order ). 此代码假定您的data.frame根据Date和Time排序(请参阅?order )。

An example: 一个例子:

(data1 <- data.frame(Date=c(rep("02/07/2014", 3), rep("02/10/2014", 4)), Price=1:7))
##         Date Price
## 1 02/07/2014     1
## 2 02/07/2014     2
## 3 02/07/2014     3
## 4 02/10/2014     4
## 5 02/10/2014     5
## 6 02/10/2014     6
## 7 02/10/2014     7
data1[!duplicated(data1$Date, fromLast=FALSE), "Price"]
## [1] 1 4
data1[!duplicated(data1$Date, fromLast=FALSE),]
##         Date Price
## 1 02/07/2014     1
## 4 02/10/2014     4
data1[!duplicated(data1$Date, fromLast=TRUE), "Price"]
## [1] 3 7
data1[!duplicated(data1$Date, fromLast=TRUE),]
##         Date Price
## 3 02/07/2014     3
## 7 02/10/2014     7

A hint for future analyses: If you want to play with the Date column not as with a factor object but as with a Date/time object (for example to apply some arithmetic operations on it), consider using strptime , eg 有关将来进行分析的提示:如果您不希望像使用因子对象而是像使用日期/时间对象那样玩Date列(例如,对其进行一些算术运算),请考虑使用strptime ,例如

data1$Date2 <- as.Date(strptime(as.character(data1$Date), "%m/%d/%Y"))

You may also eg combine date and time together: 您也可以例如将日期和时间结合在一起:

data1$DateTime <- strptime(paste(data1$Date, data1$Time), "%m/%d/%Y %H:%M:%S")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM