简体   繁体   English

如何提取信息并在R中的多个相似文件上执行相同操作?

[英]How to extract information and perform the same operation on multiple similar files in R?

I have several hundred files, each of which represent prices for a particular stock, and I want to loop through them, calculate log return, and add the log return as a column in a data frame containing log returns for all of the stocks. 我有几百个文件,每个文件代表特定股票的价格,我想遍历它们,计算对数收益,并将对数收益作为列添加到包含所有股票对数收益的数据框中。

Essentially, I have something like this, say I have three csvs that are named "a.csv", "b.csv" and "c.csv", and they look something like (the numbers below are totally fabricated, the idea is just that the dates are not necessarily the same, nor are the files the same length, but they have the same columns and names): 本质上,我有类似这样的内容,比如说我有三个分别名为“ a.csv”,“ b.csv”和“ c.csv”的csv,它们看起来像(以下数字完全是虚构的,只是日期不一定相同,文件长度也不相同,但是它们具有相同的列和名称):

a.csv: a.csv:

Date    Adj.Close
1/1/2001    5
1/2/2001    5.25
1/3/2001    5.17
1/4/2001    5.09
1/5/2001    5.83

b.csv: b.csv:

Date    Adj.Close
3/17/2005   17.85
3/18/2005   19.20
3/19/2005   18.55
3/20/2005   18.45

c.csv: c.csv:

Date    Adj.Close
5/9/1995    25.39
5/10/1995   25
5/11/1995   25.83
5/12/1995   24.99
5/13/1995   28
5/16/1995   27.17
5/17/1995   26.95

I know how to calculate log returns for one file (the below works fine for one file): 我知道如何计算一个文件的日志返回值(以下对一个文件有效):

setwd('my_wd')
data <- read.csv('a.csv')
attach(data) 
n = dim(data)[1] 
log_rtn = diff(log(Adj.Close)) 

That gives me a list of the log returns for the first csv. 这给了我第一个csv的日志返回列表。 What I want to do (in pseudo code) is: 我想做的(用伪代码)是:

for file in my_wd:
 data <- file_name.csv
 attach(data) 
 n = dim(data)[1] 
 file_name_log_rtn = diff(log(Adj.Close)) 

in order to return lists of log returns named in the same was as the csv (in pseudo-output), something like (named after the file, as below): 为了返回以csv命名的日志返回列表(在伪输出中),类似于(以文件命名,如下所示):

a_log_rtn: a_log_rtn:

0.048790164, -0.015355388,-0.015594858,0.13573917

b_log_rtn: b_log_rtn:

0.072906771, -0.03444049,-0.005405419

c_log_rtn: c_log_rtn:

-0.015479571,0.032660782,-0.033060862,0.113728765,-0.030091087,-0.008130126

Foreword: Do not use attach , you have nothing to gain from it and it is potentially harmful. 前言:不要使用attach ,您将无法从中获益,它可能有害。

Without access to your files I have not tested the code below but I would do something along the lines of it. 在无法访问您的文件的情况下,我尚未测试下面的代码,但我会按照其内容进行操作。
The trick is to use lapply to process all the files in a loop. 诀窍是使用lapply循环处理所有文件。 I use it twice, one time to read in the data and the second to create a new column with the log returns. 我使用了两次,一次是读入数据,第二次是用日志返回值创建一个新列。

olddir <- setwd('my_wd')

files_list <- list.files(pattern = "*\\.csv")
data_list <- lapply(files_list, read.csv)
data_list <- lapply(data_list, function(DF){
            DF[["log_rtn"]] <- c(NA, diff(log(DF[["Adj.Close"]])))
            DF
        })

# reset the old directory if you want
#setwd(olddir)

Note that the column log_rtn will have NA as the first value. 请注意,列log_rtn将以NA作为第一个值。 You can change this to 0 if you want but I believe that the NA makes more sense. 您可以根据需要将其更改为0 ,但我相信NA更有意义。

allfiles=list.files(path_to_the_files_here,pattern = "\\.csv")
listdata=lapply(allfiles,function(x)transform(read.csv(x),log_Adj.Close=log(Adj.Close)))

If you want you can list these to the environment: 如果需要,可以将它们列出到环境中:

list2env(setNames(listdata,gsub(".*(.)(\\.csv)","\\1",allfiles)))

Put the files in a directory, say it is called csv_dir . 将文件放在一个名为csv_dir的目录中。

csv_list <- list.files(csv_dir, pattern = "csv", full.names = T)
names(csv_list) <- basename(csv_list)
log_diffs <- lapply(csv_list, function(t) {tcsv <- read.csv(t)
                                           diff(log(tcsv$Adj.Close)
                                            })

This will produce a list log_diffs with what you want. 这将生成一个列表log_diffs与您想要的。 To see the results from a particular file you can use log_diff[["a.csv"]] for example. 要查看特定文件的结果,您可以使用log_diff[["a.csv"]] If you want to put all the results in one big data frame, with one column for the file name and another with the log differences, you could do the following: 如果要将所有结果放在一个大数据框中,其中一栏为文件名,另一栏为日志差异,则可以执行以下操作:

log_diffs <- lapply(csv_list, function(t) {tcsv <- read.csv(t)
                                           data.frame(file = rep(basename(t)),
                                           log.diff = diff(log(tcsv$Adj.Close),
                                           stringsAsFactors = F)})

csv_log_diffs <- do.call(rbind(log_diffs))

If your csv files are very large, you could consider using read_csv from the readr package, it will be faster than read.csv , and provide a progress bar. 如果您的CSV文件非常大,你可以考虑使用read_csvreadr包,它会比快read.csv ,并提供一个进度条。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 R-如何对多个变量执行相同的操作 - R - How to perform the same operation on multiple variables 如何从R中的多个类似结构化excel文件中提取注释? - How to extract comments from multiple similar structured excel files in R? 对 R 中具有相似名称的多个列应用相同的操作 - Applying the same operation with multiple columns of similar names in R 在R中:如何对多个文件执行str() - In R: How to perform a str() on multiple files 如何使用RSelenium和Rvest在R中循环遍历多个网站并提取相同的信息? - How to loop through multiple websites and extract the same information using RSelenium and rvest in R? 如何通过 R 中的类似信息导出具有合并单元格的 xlsx 文件 - how to export xlsx files with merged cells by similar information in R 将多个文件同时加载到R中(具有相似的文件名) - Loading multiple files into R at the same time (with similar file names) 对多个变量执行相同的操作,分配结果 - Perform same operation to multiple variables, assigning result 对多个变量执行相同的条件运算 - Perform same conditional operation on multiple variables R html_node挑战,应用多个html_node提取相同的信息,然后合并信息 - R html_node challenge, apply multiple html_node to extract same information, then combine the information
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM