Converting SED command to linux command

Question

I have a sed command which should be run in a python code on linux (using os.system() ) or converted to a python code. but I don't know what exactly this sed command do. It's appreciated If you give me the code or help me how to implement it using os.system in python, because I faced lots of errors when using os.system.

sed -n '1~4s/^@/>/p;2~4p' file1.fastq > file1.fasta

by the way, the input and output files should be defined dynamically in my python code:

seq_file1 = '6448.fastq'
input_file1 = os.path.join(sys.path[0],seq_file1)
os.system(os.path.join("sed -n '1~4s/^@/>/p;2~4p' "+ seq_file1 + ' > ' + os.path.splitext(os.path.basename(input_file1))[0]+".fasta") , shell = True)

Answer 1

What does exactly this sed command do?

This sed command is running two different operations at once in this file.

-n : Suppress the output of whole file. Print only lines where the instruction p is applied to.

1~4 : apply the next instruction in every 4 lines starting in the line #1.

s/^@/>/p : replace every leading @ by a > and print the result. Because of the above instruction, this one is applied in every 4 lines starting in the line #1.

; operation separator.

2~4 : apply the next instruction every 4 lines starting in the line #2.

p : print a line.

What this means: "Replace leading @ by a > in every 4 lines starting at #1 and print every 4 lines starting at #2"

Example:

Content of file1.fastq :

@ line 1
@ line 2
@ line 3
@ line 4
@ line 5
@ line 6
@ line 7
@ line 8
@ line 9
@ line 10
@ line 11
@ line 12

Run sed -n '1~4s/^@/>/p;2~4p' file1.fastq > file1.fasta

Content of file1.fasta

> line 1
@ line 2
> line 5
@ line 6
> line 9
@ line 10

A good reference is: http://www.gnu.org/software/sed/manual/sed.html

How to do the same in Python?

The below code snippet aims to be didactic, so I avoided using many Python language resources, which can be applied in order to refine the algorithm.

I tested it a few times and it worked for me.

# import Regular Expressions module
import re

output = []

# Open the input file in read mode
with open('file1.fastq', 'r') as file_in:
    replace_step = 1 # replacement starts in line #1
    print_step = 0   # print function starts in line #2 so it bypass one step
    for i, line in enumerate(file_in):
        if replace_step == 1:
            output.append(re.sub('^@', '>', line))                        
        if replace_step >= 4:
            replace_step = 1
        else:
            replace_step += 1            

        if print_step == 1:
            output.append(line)
        if print_step >= 4:
            print_step = 1
        else:   
            print_step +=1

    print("".join(output))
    

# Open the output file in write mode
with open('file1.fasta', 'w') as file_out:
    file_out.write("".join(output))

Answer 2

You can also use subprocess.run :

import subprocess
 
seq_file_in = '6448.fastq'
seq_file_out = '6448_out.fastq'
with open(seq_file_out, 'w') as fw:
    subprocess.run(["sed", r"1~4s/^@/>/p;2~4p", seq_file_in], stdout=fw)

In cases like this, when the sed command is that short and succint, subprocess.run might turn out really handy.

Converting SED command to linux command

Question

2 answers

solution1
2 ACCPTED 2021-05-03 00:18:34

What does exactly this sed command do?

Example:

How to do the same in Python?

solution2
0 2021-05-06 00:16:02

Converting SED command to linux command

Question

2 answers

solution1 2 ACCPTED 2021-05-03 00:18:34

What does exactly this sed command do?

Example:

How to do the same in Python?

solution2 0 2021-05-06 00:16:02

solution1
2 ACCPTED 2021-05-03 00:18:34

solution2
0 2021-05-06 00:16:02