简体   繁体   English

如何使用Ruby将多个CSV文件合并为一个大的CSV文件?

[英]How do I use Ruby to combine several CSV files into one big CSV file?

I have been using SmarterCSV to convert bed format file to csv file and changing the column names. 我一直在使用SmarterCSV将床格式文件转换为csv文件并更改列名称。

Now I have collected several CSV files, and want to combine them into one big CSV file. 现在我收集了几个CSV文件,并希望将它们组合成一个大的CSV文件。

In test3.csv, there are three columns, chromosome , start_site and end_site that will be used, and the other three columns, binding_site_pattern , score and strand that will be removed. 在test3.csv中,有三列,即将使用的chromosomestart_siteend_site ,以及将被删除的其他三列, binding_site_patternscorestrand

By adding three new columns to the test3.csv file, the data are all the same in the transcription_factor column: Cmyc , in the cell_type column: PWM , in the project_name column: JASPAR . 通过增加三个新列到test3.csv文件,数据都在同一transcription_factor列: Cmyc ,在cell_type列: PWM ,在project_name列: JASPAR

Anyone have any ideas on this one? 有人对这个有任何想法吗?

test1.csv test1.csv

transcription_factor,cell_type,chromosome,start_site,end_site,project_name
Cmyc,GM12878,11,809296,809827,ENCODE  
Cmyc,GM12878,11,6704236,6704683,ENCODE  

test2.csv test2.csv

transcription_factor,cell_type,chromosome,start_site,end_site,project_name  
Cmyc,H1ESC,19,9710417,9710587,ENCODE  
Cmyc,H1ESC,11,541754,542137,ENCODE  

test3.csv test3.csv

chromosome,start_site,end_site,binding_site_pattern,score,strand  
chr1,21942,21953,AAGCACGTGGT,1752,+    
chr1,21943,21954,AACCACGTGCT,1335,-  

Desired combined result: 期望的综合结果:

transcription_factor,cell_type,chromosome,start_site,end_site,project_name
Cmyc,GM12878,11,809296,809827,ENCODE  
Cmyc,GM12878,11,6704236,6704683,ENCODE  
Cmyc,H1ESC,19,9710417,9710587,ENCODE    
Cmyc,H1ESC,11,541754,542137,ENCODE   
Cmyc,PWM,1,21942,21953,JASPAR  
Cmyc,PWM,1,21943,21954,JASPAR
hs = %w{ transcription_factor cell_type chromosome start_site end_site project_name }

CSV.open('result.csv','w') do |csv|
  csv << hs
  CSV.foreach('test1.csv', headers: true) {|row| csv << row.values_at(*hs) }
  CSV.foreach('test2.csv', headers: true) {|row| csv << row.values_at(*hs) }
  CSV.foreach('test3.csv', headers: true) do |row|
    csv << ['Cmyc', 'PWM', row['chromosome'].match(/\d+/).to_s] + row.values_at('start_site', 'end_site') + ['JASPAR']
  end
end

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM