[英]r - split one csv file into multiple txt files
我需要将一个大的.csv文件(约9列和9,000多行)拆分为每行一个单独的.txt文件,并用其第一列中的名称命名每个新生成的文件。
例如.csv文件:
01001_r1 32.4327 -86.6190 0.65 0.20 0.15 1.33 5.47 8
01001_r2 32.4327 -86.6190 0.65 0.20 0.15 1.33 5.46 8
01001_r3 32.4327 -86.6190 0.80 0.15 0.05 1.33 5.23 10
01003_r1 30.4887 -87.6918 0.65 0.20 0.15 1.33 5.23 9
01003_r2 30.4887 -87.6918 0.80 0.15 0.05 1.33 5.25 9
01003_r3 30.4887 -87.6918 0.65 0.20 0.15 1.33 4.96 8
我最终将得到6个文件,每个文件只有一行。
输出文件中的列需要用“制表符分隔”,并且文件中不得包含行名或列名。
例如,输出文件应如下所示:
01001_r1 32.4327 -86.6190 0.65 0.20 0.15 1.33 5.47 8
这是到目前为止我到达的地方:
#set 'working directory'
setwd('C:/Users/Data/soils_data/sitesoil_in')
#identify data frame from .csv file
sd <- read.csv('site_soil.csv', sep="\t", header=F, fill=F)
lapply(1:nrow(sd), function(i) write.csv(sd[i,],
file = paste0(sd[i,1], ".txt"),
row.names = F, header = F,
quote = F))
这就是我为每个输出文件得到的:
档案名称:01001_r1
V1,V2,V3,V4,V5,V6,V7,V8,V9
01001_r1,32.4327,-86.619,0.65,0.2,0.15,1.33,5.47,8
我无法消除列名或使用制表符分隔列。 我尝试使用header = F或col.names = F消除标题,并使用sep =“ \\ t”分隔列,但无法识别命令。
我将不胜感激任何帮助。 谢谢,E。
按照所有建议,这是可以解决问题的更简单的代码:
#set 'working directory'
setwd('C:/Users/Elena/Desktop/DayCent_muvp_MODEL/DayCent_SourceData/soils_data/sitesoil_in')
#identify data frame from .csv file
sd <- read.csv('site_soil.csv', sep="\t", header=F, fill=F)
lapply(1:nrow(sd),
function(i) write.table(sd[i,],
file = paste0(sd[i,1], ".txt",collapse = ""),
row.names = FALSE, col.names = FALSE,
sep = "\t"
))
谢谢大家的帮助。 E.
尝试这个
dat <-"01001_r1,32.4327,-86.6190,0.65,0.20,0.15,1.33,5.47,8
01001_r2,32.4327,-86.6190,0.65,0.20,0.15,1.33,5.46,8
01001_r3,32.4327,-86.6190,0.80,0.15,0.05,1.33,5.23,10
01003_r1,30.4887,-87.6918,0.65,0.20,0.15,1.33,5.23,9
01003_r2,30.4887,-87.6918,0.80,0.15,0.05,1.33,5.25,9
01003_r3,30.4887,-87.6918,0.65,0.20,0.15,1.33,4.96,8
"
df <- read.delim(file = textConnection(dat), sep = ',', header = FALSE)
df
# V1 V2 V3 V4 V5 V6 V7 V8 V9
# 1 01001_r1 32.4327 -86.6190 0.65 0.20 0.15 1.33 5.47 8
# 2 01001_r2 32.4327 -86.6190 0.65 0.20 0.15 1.33 5.46 8
# 3 01001_r3 32.4327 -86.6190 0.80 0.15 0.05 1.33 5.23 10
# 4 01003_r1 30.4887 -87.6918 0.65 0.20 0.15 1.33 5.23 9
# 5 01003_r2 30.4887 -87.6918 0.80 0.15 0.05 1.33 5.25 9
# 6 01003_r3 30.4887 -87.6918 0.65 0.20 0.15 1.33 4.96 8
output_file_base <- "soil_"
output_file_ext <- ".tsv"
for(i in seq(nrow(df))){
output_file <- paste0(output_file_base, as.character(i), output_file_ext)
dfi <- df[i, ]
write.table(x = dfi, file = output_file, sep = '\t', quote = FALSE, col.names = FALSE, row.names = FALSE)
}
输出:
$ cat soil_6.tsv
01003_r3 30.4887 -87.6918 0.65 0.2 0.15 1.33 4.96 8
我调整了您的代码:
lapply(1:nrow(sd),
function(i) write.table(sd[i,],
file = paste0(sd[i,1],".txt",collapse = ""),
row.names = FALSE,
sep = "\t"
))
这可能对您要完成的工作有效。
df <-read.csv(text = "01001_r1,32.4327,-86.6190,0.65,0.20,0.15,1.33,5.47,8
01001_r2,32.4327,-86.6190,0.65,0.20,0.15,1.33,5.46,8
01001_r3,32.4327,-86.6190,0.80,0.15,0.05,1.33,5.23,10
01003_r1,30.4887,-87.6918,0.65,0.20,0.15,1.33,5.23,9
01003_r2,30.4887,-87.6918,0.80,0.15,0.05,1.33,5.25,9
01003_r3,30.4887,-87.6918,0.65,0.20,0.15,1.33,4.96,8",
stringsAsFactors = FALSE,
header = FALSE)
apply(df, 1, function(x){write.table(t(x),
file = paste0(x[1],".txt"),
sep = "\t",
quote = FALSE,
col.names = FALSE,
row.names = FALSE)})
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.