简体   繁体   中英

R: Extracting Data Frame from list of CSV file data

So I come from a background of Matlab and Python (and several others less related). I'm picking up R for a Coursera course.

I followed this SO answer in order to read in all my homework files into a list in a single line of code. My code looks like this:

# Get a list of files
files = list.files(path = dataDir, pattern = '*.csv')

# Import the file data
setwd(dataDir)
data = lapply(files, read.csv)

This all works just fine. However, I am getting a object back that I don't know how to access. I mentioned Matlab and Python before because I've attempted to access the data in all the ways I would in those languages.

Here's what summary output:

summary(data)
       Length Class      Mode
  [1,] 4      data.frame list
  [2,] 4      data.frame list
  [3,] 4      data.frame list

There are actually 352 of them not just 3 but no one needs a listing of all 352. Here's what summary of an individual index outputs:

summary(data[200])
     Length Class      Mode
[1,] 4      data.frame list

So if I enter data[200] I get listing of the first 2500 rows of data. But data[200, 100] returns as error as does data[200][,100] and data[200][100,] . data[200][100] returns [[1]] NULL .

While I haven't fully considered what I will need to do for this homework I'm sure it will involve calculating means/medians/maximum/etc of all non-NA values in various data columns. This wasn't tough to do for the quizzes using something like mean(data[which(is.na('Col1')==F), 'Col6']) .

So I imagine I could use a more hackish version of what I need where I simply load the 1 file I need at the time I need it, extract only the portion of the data frame I need right then, and loop over all the data files I need to process. However, I'd rather know how to access the data in the object R creates from the lapply line. I suspect this will make more complex analyses later on much easier to code.

Thanks

When you subset, single square brackets [ always return an object of the same class as the object you are subsetting. So, data[200] returns a list of length 1 containing one dataframe because data is a list. Double square brackets [[ give you the actual object contained in the list (in this case, a dataframe). Once you have a dataframe, you can select the first 100 rows with [100,] , which is why the following works:

data[[200]][100,]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM