如何提取信息并在R中的多个相似文件上执行相同操作？

Question

I have several hundred files, each of which represent prices for a particular stock, and I want to loop through them, calculate log return, and add the log return as a column in a data frame containing log returns for all of the stocks. 我有几百个文件，每个文件代表特定股票的价格，我想遍历它们，计算对数收益，并将对数收益作为列添加到包含所有股票对数收益的数据框中。

Essentially, I have something like this, say I have three csvs that are named "a.csv", "b.csv" and "c.csv", and they look something like (the numbers below are totally fabricated, the idea is just that the dates are not necessarily the same, nor are the files the same length, but they have the same columns and names): 本质上，我有类似这样的内容，比如说我有三个分别名为“ a.csv”，“ b.csv”和“ c.csv”的csv，它们看起来像（以下数字完全是虚构的，只是日期不一定相同，文件长度也不相同，但是它们具有相同的列和名称）：

a.csv: a.csv：

Date    Adj.Close
1/1/2001    5
1/2/2001    5.25
1/3/2001    5.17
1/4/2001    5.09
1/5/2001    5.83

b.csv: b.csv：

Date    Adj.Close
3/17/2005   17.85
3/18/2005   19.20
3/19/2005   18.55
3/20/2005   18.45

c.csv: c.csv：

Date    Adj.Close
5/9/1995    25.39
5/10/1995   25
5/11/1995   25.83
5/12/1995   24.99
5/13/1995   28
5/16/1995   27.17
5/17/1995   26.95

I know how to calculate log returns for one file (the below works fine for one file): 我知道如何计算一个文件的日志返回值（以下对一个文件有效）：

setwd('my_wd')
data <- read.csv('a.csv')
attach(data) 
n = dim(data)[1] 
log_rtn = diff(log(Adj.Close))

That gives me a list of the log returns for the first csv. 这给了我第一个csv的日志返回列表。 What I want to do (in pseudo code) is: 我想做的（用伪代码）是：

for file in my_wd:
 data <- file_name.csv
 attach(data) 
 n = dim(data)[1] 
 file_name_log_rtn = diff(log(Adj.Close))

in order to return lists of log returns named in the same was as the csv (in pseudo-output), something like (named after the file, as below): 为了返回以csv命名的日志返回列表（在伪输出中），类似于（以文件命名，如下所示）：

a_log_rtn: a_log_rtn：

0.048790164, -0.015355388,-0.015594858,0.13573917

b_log_rtn: b_log_rtn：

0.072906771, -0.03444049,-0.005405419

c_log_rtn: c_log_rtn：

-0.015479571,0.032660782,-0.033060862,0.113728765,-0.030091087,-0.008130126

Answer 1

Foreword: Do not use attach , you have nothing to gain from it and it is potentially harmful. 前言：不要使用attach ，您将无法从中获益，它可能有害。

Without access to your files I have not tested the code below but I would do something along the lines of it. 在无法访问您的文件的情况下，我尚未测试下面的代码，但我会按照其内容进行操作。
The trick is to use lapply to process all the files in a loop. 诀窍是使用lapply循环处理所有文件。 I use it twice, one time to read in the data and the second to create a new column with the log returns. 我使用了两次，一次是读入数据，第二次是用日志返回值创建一个新列。

olddir <- setwd('my_wd')

files_list <- list.files(pattern = "*\\.csv")
data_list <- lapply(files_list, read.csv)
data_list <- lapply(data_list, function(DF){
            DF[["log_rtn"]] <- c(NA, diff(log(DF[["Adj.Close"]])))
            DF
        })

# reset the old directory if you want
#setwd(olddir)

Note that the column log_rtn will have NA as the first value. 请注意，列log_rtn将以NA作为第一个值。 You can change this to 0 if you want but I believe that the NA makes more sense. 您可以根据需要将其更改为0 ，但我相信NA更有意义。

Answer 2

allfiles=list.files(path_to_the_files_here,pattern = "\\.csv")
listdata=lapply(allfiles,function(x)transform(read.csv(x),log_Adj.Close=log(Adj.Close)))

If you want you can list these to the environment: 如果需要，可以将它们列出到环境中：

list2env(setNames(listdata,gsub(".*(.)(\\.csv)","\\1",allfiles)))

Answer 3

Put the files in a directory, say it is called csv_dir . 将文件放在一个名为csv_dir的目录中。

csv_list <- list.files(csv_dir, pattern = "csv", full.names = T)
names(csv_list) <- basename(csv_list)
log_diffs <- lapply(csv_list, function(t) {tcsv <- read.csv(t)
                                           diff(log(tcsv$Adj.Close)
                                            })

This will produce a list log_diffs with what you want. 这将生成一个列表log_diffs与您想要的。 To see the results from a particular file you can use log_diff[["a.csv"]] for example. 要查看特定文件的结果，您可以使用log_diff[["a.csv"]] 。 If you want to put all the results in one big data frame, with one column for the file name and another with the log differences, you could do the following: 如果要将所有结果放在一个大数据框中，其中一栏为文件名，另一栏为日志差异，则可以执行以下操作：

log_diffs <- lapply(csv_list, function(t) {tcsv <- read.csv(t)
                                           data.frame(file = rep(basename(t)),
                                           log.diff = diff(log(tcsv$Adj.Close),
                                           stringsAsFactors = F)})

csv_log_diffs <- do.call(rbind(log_diffs))

If your csv files are very large, you could consider using read_csv from the readr package, it will be faster than read.csv , and provide a progress bar. 如果您的CSV文件非常大，你可以考虑使用read_csv从readr包，它会比快read.csv ，并提供一个进度条。

如何提取信息并在R中的多个相似文件上执行相同操作？

问题描述

3 个解决方案

解决方案1
3 已采纳 2018-02-04 07:17:59

解决方案2
2 2018-02-04 07:28:24

解决方案3
1 2018-02-04 07:28:39

如何提取信息并在R中的多个相似文件上执行相同操作？

问题描述

3 个解决方案

解决方案1 3 已采纳 2018-02-04 07:17:59

解决方案2 2 2018-02-04 07:28:24

解决方案3 1 2018-02-04 07:28:39

解决方案1
3 已采纳 2018-02-04 07:17:59

解决方案2
2 2018-02-04 07:28:24

解决方案3
1 2018-02-04 07:28:39