[英]How do I use Ruby to combine several CSV files into one big CSV file?
I have been using SmarterCSV to convert bed format file to csv file and changing the column names. 我一直在使用SmarterCSV将床格式文件转换为csv文件并更改列名称。
Now I have collected several CSV files, and want to combine them into one big CSV file. 现在我收集了几个CSV文件,并希望将它们组合成一个大的CSV文件。
In test3.csv, there are three columns, chromosome
, start_site
and end_site
that will be used, and the other three columns, binding_site_pattern
, score
and strand
that will be removed. 在test3.csv中,有三列,即将使用的chromosome
, start_site
和end_site
,以及将被删除的其他三列, binding_site_pattern
, score
和strand
。
By adding three new columns to the test3.csv file, the data are all the same in the transcription_factor
column: Cmyc
, in the cell_type
column: PWM
, in the project_name
column: JASPAR
. 通过增加三个新列到test3.csv文件,数据都在同一transcription_factor
列: Cmyc
,在cell_type
列: PWM
,在project_name
列: JASPAR
。
Anyone have any ideas on this one? 有人对这个有任何想法吗?
test1.csv test1.csv
transcription_factor,cell_type,chromosome,start_site,end_site,project_name
Cmyc,GM12878,11,809296,809827,ENCODE
Cmyc,GM12878,11,6704236,6704683,ENCODE
test2.csv test2.csv
transcription_factor,cell_type,chromosome,start_site,end_site,project_name
Cmyc,H1ESC,19,9710417,9710587,ENCODE
Cmyc,H1ESC,11,541754,542137,ENCODE
test3.csv test3.csv
chromosome,start_site,end_site,binding_site_pattern,score,strand
chr1,21942,21953,AAGCACGTGGT,1752,+
chr1,21943,21954,AACCACGTGCT,1335,-
Desired combined result: 期望的综合结果:
transcription_factor,cell_type,chromosome,start_site,end_site,project_name
Cmyc,GM12878,11,809296,809827,ENCODE
Cmyc,GM12878,11,6704236,6704683,ENCODE
Cmyc,H1ESC,19,9710417,9710587,ENCODE
Cmyc,H1ESC,11,541754,542137,ENCODE
Cmyc,PWM,1,21942,21953,JASPAR
Cmyc,PWM,1,21943,21954,JASPAR
hs = %w{ transcription_factor cell_type chromosome start_site end_site project_name }
CSV.open('result.csv','w') do |csv|
csv << hs
CSV.foreach('test1.csv', headers: true) {|row| csv << row.values_at(*hs) }
CSV.foreach('test2.csv', headers: true) {|row| csv << row.values_at(*hs) }
CSV.foreach('test3.csv', headers: true) do |row|
csv << ['Cmyc', 'PWM', row['chromosome'].match(/\d+/).to_s] + row.values_at('start_site', 'end_site') + ['JASPAR']
end
end
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.