Convert rows into columns

Question

I have a file in rows as below and would like to convert into two column format.

>00000_x1688514
TGCTTGGACTACATATGGTTGAGGGTTGTA
>00001_x238968
TGCTTGGACTACATATTGTTGAGGGTTGTA
...

Desired output is

>00000_x1688514 TGCTTGGACTACATATGGTTGAGGGTTGTA
>00001_x238968 TGCTTGGACTACATATTGTTGAGGGTTGTA
...

I would appreciate any help. Thanks.

Answer 1

I don't know if you are aware of the BioPerl modules for reading/writing and other genetic functions. Your problem can be written like this.

#!/usr/bin/perl
use strict;
use warnings;
use Bio::SeqIO;

my $file = 'o33.txt';
my $in  = Bio::SeqIO->new( -file   =>  $file,
                           -format => 'fasta');

while ( my $seq = $in->next_seq() ) {
    print $seq->id, "\t", $seq->seq, "\n";
}

__END__
00000_x1688514  TGCTTGGACTACATATGGTTGAGGGTTGTA
00001_x238968   TGCTTGGACTACATATTGTTGAGGGTTGTA

Answer 2

In python:

fd = open('filepath')
cols = izip(fd, fd)
with open('output_filepath') as outfile:
    for col in cols:
        outfile.write('\t'.join(col).replace('\n', '') +'\n')

The desired output should be in output_filepath

Answer 3

Another Perl option is to set the record delimiter to '>', to read in two lines at a time, then substitute the newline for a tab:

use Modern::Perl;

local $/ = '>';
do { s/\n/\t/; print }
  for <DATA>;

__DATA__
>00000_x1688514
TGCTTGGACTACATATGGTTGAGGGTTGTA
>00001_x238968
TGCTTGGACTACATATTGTTGAGGGTTGTA

Output:

>00000_x1688514 TGCTTGGACTACATATGGTTGAGGGTTGTA
>00001_x238968  TGCTTGGACTACATATTGTTGAGGGTTGTA

For a file:

use Modern::Perl;
use autodie;

open my $inFile,  '<', 'inFile.txt';
open my $outFile, '>', 'outFile.txt';

local $/ = '>';
do { s/\n/\t/; print $outFile $_ }
  for <$inFile>;

close $inFile;
close $outFile;

Hope this helps!

Answer 4

One approach:

perl -i -pe 's/\n/ / unless m/^[ACGT]+$/' FILENAME

This will in-place edit the file FILENAME , replacing a newline with a space in every line that isn't a string of A's, C's, G's, and T's.

Answer 5

Using awk :

awk '{ printf "%s", $0 (substr( $0, 1, 1 ) == ">" ? " " : ORS) }' infile

Output:

>00000_x1688514 TGCTTGGACTACATATGGTTGAGGGTTGTA
>00001_x238968 TGCTTGGACTACATATTGTTGAGGGTTGTA

Answer 6

In Ruby I'd use something like:

File.readlines('test.txt').map(&:strip).each_slice(2) do |row|
  puts row.join(' ')
end

Which outputs:

>00000_x1688514 TGCTTGGACTACATATGGTTGAGGGTTGTA
>00001_x238968 TGCTTGGACTACATATTGTTGAGGGTTGTA

Answer 7

A tidier Python solution:

from itertools import izip

with open('test.txt') as inf, open('newtest.txt', 'w') as outf:
    for head,body in izip(inf, inf):
        outf.write(head.rstrip() + ' ' + body)

Answer 8

Assuming the input is in true FASTA format, you can use awk and the getline function:

awk '/^>/ { printf "%s ", $0; getline; print }' file.txt

Output:

>00000_x1688514 TGCTTGGACTACATATGGTTGAGGGTTGTA
>00001_x238968 TGCTTGGACTACATATTGTTGAGGGTTGTA

HTH

Convert rows into columns

Question

8 answers

solution1
7 2012-07-09 22:25:56

solution2
6 2012-07-09 22:09:40

solution3
2 2012-07-09 23:20:45

solution4
1 2012-07-09 21:55:29

solution5
0 2012-07-09 22:07:30

solution6
0 2012-07-10 00:00:21

solution7
0 2012-07-10 00:40:40

solution8
0 2012-07-10 11:50:29

Convert rows into columns

Question

8 answers

solution1 7 2012-07-09 22:25:56

solution2 6 2012-07-09 22:09:40

solution3 2 2012-07-09 23:20:45

solution4 1 2012-07-09 21:55:29

solution5 0 2012-07-09 22:07:30

solution6 0 2012-07-10 00:00:21

solution7 0 2012-07-10 00:40:40

solution8 0 2012-07-10 11:50:29

solution1
7 2012-07-09 22:25:56

solution2
6 2012-07-09 22:09:40

solution3
2 2012-07-09 23:20:45

solution4
1 2012-07-09 21:55:29

solution5
0 2012-07-09 22:07:30

solution6
0 2012-07-10 00:00:21

solution7
0 2012-07-10 00:40:40

solution8
0 2012-07-10 11:50:29