简体   繁体   English

读入R前先编辑多个.csv文件

[英]Editing multiple .csv files before reading into r

I have 44 .csv files in my WD that I will eventually read into r and bind into one large file. 我的WD中有44个.csv文件,最终我将它们读入r并绑定为一个大文件。 Before I do that, I'd like to make some changes to each of the files. 在此之前,我想对每个文件进行一些更改。 I want to: 我想要:

  1. Change some column names in some of the files 更改某些文件中的某些列名
  2. Select only the first 10 columns of each file 仅选择每个文件的前10列

I've found some information on gsub for problem 1) but not enough to get me where I want to be. 我已经找到了有关问题1)的gsub的一些信息,但还不足以使我到达想要的位置。 As for 2), it seems that this should be quite simple, but I can't find any solution online. 至于2),这似乎应该很简单,但是我在网上找不到任何解决方案。

Many thanks! 非常感谢!

This may work to get you the output you are looking for. 这可能会为您提供所需的输出。

# Set path to folder
  folder.path <- getwd()

# Get list of csv files in folder
  filenames <- list.files(folder.path, pattern = "*.csv", full.names = TRUE)

# Read all csv files in the folder and create a list of dataframes
  ldf <- lapply(filenames, read.csv)

# Select the first 10 columns in each dataframe in the list
  ldf <- lapply(ldf, subset, select = 1:10)

# Create a vector for the new column names
  new.col.names <- c("col1","col2","col3","col4","col5","col6","col7","col8","col9","col10")

# Assign the new column names to each dataframe in the list
  ldf <- lapply(ldf, setNames, new.col.names)

# Combine each dataframe in the list into a single dataframe
  df.final <- do.call("rbind", ldf)

readLines is your friend. readLines是您的朋友。 Try import each one of them as separate vector eg my_csv<-readLines("path/to/your/csv") then perform the modifications and finally save the output as follows: 尝试将其中每个作为单独的向量导入,例如, my_csv<-readLines("path/to/your/csv")然后进行修改,最后将输出保存如下:

out <- capture.output(my_csv)
cat(out, file="my_new.csv", sep="\n", append=F)

BUT

I would strongly recommend using data.table package and in particular the fread() function that allows fast import of csv's (as data.table objects) and then perform on them both the selection of 10 columns and the name alteration. 我强烈建议使用data.table包,尤其是fread()函数,该函数允许快速导入csv(作为data.table对象),然后对它们执行10列的选择和名称更改。 Of course via fwrite() you can send their info back to csv at anytime. 当然,可以通过fwrite()随时将其信息发送回csv。

FINALLY 最后

and use only if the columns of every csv have the same position and name, in order to keep only the first 10 as you mentioned above 并仅在每个csv的列具有相同的位置和名称时使用,以便仅保留前面提到的前10个

A combination of lapply and data.table can do miracles. lapplydata.table组合可以data.table奇迹。 In particular: 尤其是:

rbindlist(lapply(list.files("path/to/the/folder/with/csvs"),fread),use.names=TRUE, fill=FALSE) 

will solve most of your data import issues. 将解决您的大多数数据导入问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM