简体   繁体   中英

How to extract information and perform the same operation on multiple similar files in R?

I have several hundred files, each of which represent prices for a particular stock, and I want to loop through them, calculate log return, and add the log return as a column in a data frame containing log returns for all of the stocks.

Essentially, I have something like this, say I have three csvs that are named "a.csv", "b.csv" and "c.csv", and they look something like (the numbers below are totally fabricated, the idea is just that the dates are not necessarily the same, nor are the files the same length, but they have the same columns and names):

a.csv:

Date    Adj.Close
1/1/2001    5
1/2/2001    5.25
1/3/2001    5.17
1/4/2001    5.09
1/5/2001    5.83

b.csv:

Date    Adj.Close
3/17/2005   17.85
3/18/2005   19.20
3/19/2005   18.55
3/20/2005   18.45

c.csv:

Date    Adj.Close
5/9/1995    25.39
5/10/1995   25
5/11/1995   25.83
5/12/1995   24.99
5/13/1995   28
5/16/1995   27.17
5/17/1995   26.95

I know how to calculate log returns for one file (the below works fine for one file):

setwd('my_wd')
data <- read.csv('a.csv')
attach(data) 
n = dim(data)[1] 
log_rtn = diff(log(Adj.Close)) 

That gives me a list of the log returns for the first csv. What I want to do (in pseudo code) is:

for file in my_wd:
 data <- file_name.csv
 attach(data) 
 n = dim(data)[1] 
 file_name_log_rtn = diff(log(Adj.Close)) 

in order to return lists of log returns named in the same was as the csv (in pseudo-output), something like (named after the file, as below):

a_log_rtn:

0.048790164, -0.015355388,-0.015594858,0.13573917

b_log_rtn:

0.072906771, -0.03444049,-0.005405419

c_log_rtn:

-0.015479571,0.032660782,-0.033060862,0.113728765,-0.030091087,-0.008130126

Foreword: Do not use attach , you have nothing to gain from it and it is potentially harmful.

Without access to your files I have not tested the code below but I would do something along the lines of it.
The trick is to use lapply to process all the files in a loop. I use it twice, one time to read in the data and the second to create a new column with the log returns.

olddir <- setwd('my_wd')

files_list <- list.files(pattern = "*\\.csv")
data_list <- lapply(files_list, read.csv)
data_list <- lapply(data_list, function(DF){
            DF[["log_rtn"]] <- c(NA, diff(log(DF[["Adj.Close"]])))
            DF
        })

# reset the old directory if you want
#setwd(olddir)

Note that the column log_rtn will have NA as the first value. You can change this to 0 if you want but I believe that the NA makes more sense.

allfiles=list.files(path_to_the_files_here,pattern = "\\.csv")
listdata=lapply(allfiles,function(x)transform(read.csv(x),log_Adj.Close=log(Adj.Close)))

If you want you can list these to the environment:

list2env(setNames(listdata,gsub(".*(.)(\\.csv)","\\1",allfiles)))

Put the files in a directory, say it is called csv_dir .

csv_list <- list.files(csv_dir, pattern = "csv", full.names = T)
names(csv_list) <- basename(csv_list)
log_diffs <- lapply(csv_list, function(t) {tcsv <- read.csv(t)
                                           diff(log(tcsv$Adj.Close)
                                            })

This will produce a list log_diffs with what you want. To see the results from a particular file you can use log_diff[["a.csv"]] for example. If you want to put all the results in one big data frame, with one column for the file name and another with the log differences, you could do the following:

log_diffs <- lapply(csv_list, function(t) {tcsv <- read.csv(t)
                                           data.frame(file = rep(basename(t)),
                                           log.diff = diff(log(tcsv$Adj.Close),
                                           stringsAsFactors = F)})

csv_log_diffs <- do.call(rbind(log_diffs))

If your csv files are very large, you could consider using read_csv from the readr package, it will be faster than read.csv , and provide a progress bar.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM