I have several hundred files, each of which represent prices for a particular stock, and I want to loop through them, calculate log return, and add the log return as a column in a data frame containing log returns for all of the stocks.
Essentially, I have something like this, say I have three csvs that are named "a.csv", "b.csv" and "c.csv", and they look something like (the numbers below are totally fabricated, the idea is just that the dates are not necessarily the same, nor are the files the same length, but they have the same columns and names):
a.csv:
Date Adj.Close
1/1/2001 5
1/2/2001 5.25
1/3/2001 5.17
1/4/2001 5.09
1/5/2001 5.83
b.csv:
Date Adj.Close
3/17/2005 17.85
3/18/2005 19.20
3/19/2005 18.55
3/20/2005 18.45
c.csv:
Date Adj.Close
5/9/1995 25.39
5/10/1995 25
5/11/1995 25.83
5/12/1995 24.99
5/13/1995 28
5/16/1995 27.17
5/17/1995 26.95
I know how to calculate log returns for one file (the below works fine for one file):
setwd('my_wd')
data <- read.csv('a.csv')
attach(data)
n = dim(data)[1]
log_rtn = diff(log(Adj.Close))
That gives me a list of the log returns for the first csv. What I want to do (in pseudo code) is:
for file in my_wd:
data <- file_name.csv
attach(data)
n = dim(data)[1]
file_name_log_rtn = diff(log(Adj.Close))
in order to return lists of log returns named in the same was as the csv (in pseudo-output), something like (named after the file, as below):
a_log_rtn:
0.048790164, -0.015355388,-0.015594858,0.13573917
b_log_rtn:
0.072906771, -0.03444049,-0.005405419
c_log_rtn:
-0.015479571,0.032660782,-0.033060862,0.113728765,-0.030091087,-0.008130126
Foreword: Do not use attach
, you have nothing to gain from it and it is potentially harmful.
Without access to your files I have not tested the code below but I would do something along the lines of it.
The trick is to use lapply
to process all the files in a loop. I use it twice, one time to read in the data and the second to create a new column with the log returns.
olddir <- setwd('my_wd')
files_list <- list.files(pattern = "*\\.csv")
data_list <- lapply(files_list, read.csv)
data_list <- lapply(data_list, function(DF){
DF[["log_rtn"]] <- c(NA, diff(log(DF[["Adj.Close"]])))
DF
})
# reset the old directory if you want
#setwd(olddir)
Note that the column log_rtn
will have NA
as the first value. You can change this to 0
if you want but I believe that the NA
makes more sense.
allfiles=list.files(path_to_the_files_here,pattern = "\\.csv")
listdata=lapply(allfiles,function(x)transform(read.csv(x),log_Adj.Close=log(Adj.Close)))
If you want you can list these to the environment:
list2env(setNames(listdata,gsub(".*(.)(\\.csv)","\\1",allfiles)))
Put the files in a directory, say it is called csv_dir
.
csv_list <- list.files(csv_dir, pattern = "csv", full.names = T)
names(csv_list) <- basename(csv_list)
log_diffs <- lapply(csv_list, function(t) {tcsv <- read.csv(t)
diff(log(tcsv$Adj.Close)
})
This will produce a list log_diffs
with what you want. To see the results from a particular file you can use log_diff[["a.csv"]]
for example. If you want to put all the results in one big data frame, with one column for the file name and another with the log differences, you could do the following:
log_diffs <- lapply(csv_list, function(t) {tcsv <- read.csv(t)
data.frame(file = rep(basename(t)),
log.diff = diff(log(tcsv$Adj.Close),
stringsAsFactors = F)})
csv_log_diffs <- do.call(rbind(log_diffs))
If your csv files are very large, you could consider using read_csv
from the readr
package, it will be faster than read.csv
, and provide a progress bar.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.