简体   繁体   中英

R - combining lines from multiple CSV into a data frame

I have a folder with hundreds of CSV files each containing data for a particular postal code.

Each CSV files contains two columns and thousands of rows. Descriptors are in Column A, values are in Column B.

I need to extract two pieces of information from each file and create a new table or dataframe using the values in [Column A, Row 2] (which is the postal code) and [Column B, Row 1585] (which is the median income).

The end result should be a table/dataframe with two columns: one for postal code, the other for median income.

Any help or advice would be appreciated.

You can use list.files function to get directories for all your files and then use read.csv and rbind in for loop to create one data.frame .

Something like this:

direct<-list.files("directory_to_your_files")
df<-NULL
for(i in length(direct)){
  df<-rbind(df,read.csv(direct[i]))
}

Disclaimer: this question is pretty vague. Next time, be sure to add a reproducible example that we can run on our machines. It will help you, the people answering your questions, and future users.

You might try something like:

files = list.files("~/Directory")

my_df = data.frame(matrix(ncol = 2, nrow = length(files)

for(i in 1:length(files)){
    row1 = read.csv("~/Directory/files[i]",nrows = 1)
    row2 = read.csv("~/Directory/files[i]", skip = 1585, nrows = 1)
    my_df = rbind(my_df, rbind(row1, row2))
}

my_df = my_df[,c("A","B")]
# Note on interpreting indexing syntax: 
  Read this as "my_df is now (=) my_df such that ([) the columns (,) 
  are only A and B (c("A", "B")) "

So here is the code which does what I want it to do. If there are more elegant solutions, please feel free to point them out.

# set the working directory to where the data files are stored
setwd("/foo")

# count the files
files = list.files("/foo")

#create an empty dataframe and name the columns

dataMatrix=data.frame(matrix(c(rep(NA,times=2*length(files))),nrow=length(files)))
colnames(dataMatrix)=c("Postal Code", "Median Income")

# create a for loop to get the information in R2/C1 and R1585/C2 of each data file
# Data is R2/C1 is a string, but is interpreted as a number unless specifically declared a string

for(i in 1:length(files)) {
  getData = read.csv(files[i],header=F)
  dataMatrix[i,1]=toString(getData[2,1])
  dataMatrix[i,2]=(getData[1585,2])
}

Thank you to all those who helped me figure this out, especially Nancy.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM