R：从CSV文件数据列表中提取数据框

Question

So I come from a background of Matlab and Python (and several others less related). 所以我来自Matlab和Python（以及其他一些不太相关的背景）。 I'm picking up R for a Coursera course. 我正在为Coursera课程学习R。

I followed this SO answer in order to read in all my homework files into a list in a single line of code. 我遵循了这样的答案，以便将我的所有作业文件读入单行代码的列表中。 My code looks like this: 我的代码如下所示：

# Get a list of files
files = list.files(path = dataDir, pattern = '*.csv')

# Import the file data
setwd(dataDir)
data = lapply(files, read.csv)

This all works just fine. 这一切都很好。 However, I am getting a object back that I don't know how to access. 但是，我找回了一个我不知道如何访问的对象。 I mentioned Matlab and Python before because I've attempted to access the data in all the ways I would in those languages. 我之前提到过Matlab和Python，是因为我试图用我在那些语言中使用的所有方式来访问数据。

Here's what summary output: 这是摘要输出：

summary(data)
       Length Class      Mode
  [1,] 4      data.frame list
  [2,] 4      data.frame list
  [3,] 4      data.frame list

There are actually 352 of them not just 3 but no one needs a listing of all 352. Here's what summary of an individual index outputs: 实际上，其中有352个不仅3个，而且没有人需要列出所有352个。这是单个索引输出的summary ：

summary(data[200])
     Length Class      Mode
[1,] 4      data.frame list

So if I enter data[200] I get listing of the first 2500 rows of data. 因此，如果我输入data[200]则会得到前2500行数据的列表。 But data[200, 100] returns as error as does data[200][,100] and data[200][100,] . 但是data[200, 100]以及data[200][,100]和data[200][100,]一样作为错误返回。 data[200][100] returns [[1]] NULL . data[200][100]返回[[1]] NULL 。

While I haven't fully considered what I will need to do for this homework I'm sure it will involve calculating means/medians/maximum/etc of all non-NA values in various data columns. 尽管我还没有完全考虑完成此家庭作业需要做什么，但我确定它将涉及计算各种数据列中所有非NA值的均值/中位数/最大值/等。 This wasn't tough to do for the quizzes using something like mean(data[which(is.na('Col1')==F), 'Col6']) . 对于使用诸如mean(data[which(is.na('Col1')==F), 'Col6'])类的测验mean(data[which(is.na('Col1')==F), 'Col6']) 。

So I imagine I could use a more hackish version of what I need where I simply load the 1 file I need at the time I need it, extract only the portion of the data frame I need right then, and loop over all the data files I need to process. 因此，我想我可以在需要的地方使用更hackish的版本，只需在需要时加载所需的1个文件，然后仅提取所需的数据帧部分，然后遍历所有数据文件我需要处理。 However, I'd rather know how to access the data in the object R creates from the lapply line. 但是，我宁愿知道如何访问lapply行中R创建的对象中的数据。 I suspect this will make more complex analyses later on much easier to code. 我怀疑这将使以后更复杂的分析变得更容易编写代码。

Thanks 谢谢

Answer 1

When you subset, single square brackets [ always return an object of the same class as the object you are subsetting. 子集化时，单个方括号[总是返回与您要设置的对象相同类的对象。 So, data[200] returns a list of length 1 containing one dataframe because data is a list. 因此， data[200]返回一个包含一个数据帧的长度为1的list ，因为data是一个列表。 Double square brackets [[ give you the actual object contained in the list (in this case, a dataframe). 双方括号[[为您提供列表中包含的实际对象（在这种情况下，为数据框）。 Once you have a dataframe, you can select the first 100 rows with [100,] , which is why the following works: 有了数据框后，就可以使用[100,]选择前100行，这就是以下原因的原因：

data[[200]][100,]

R：从CSV文件数据列表中提取数据框

问题描述

1 个解决方案

解决方案1
3 已采纳 2015-10-16 19:59:20

R：从CSV文件数据列表中提取数据框

问题描述

1 个解决方案

解决方案1 3 已采纳 2015-10-16 19:59:20

解决方案1
3 已采纳 2015-10-16 19:59:20