How to replace letters in lines in fasta file using bash loops?

Question

I want to change all n in the sequence into - , but I don't know how to make my bash script not change the n that show up in sequence names. I'm not experienced with sed or regex to make sure my bash script reads only the lines that do not start with > , as that indicates the header.

Example file:

>Name_with_nnn
nnnatgcnnnatttg
>Name2_with_nnn
atgggnnnnGGtnnn

At the same time I want to convert all lowercase letters into uppercase, only in the sequence lines. I don't even know how to begin using sed, I find it really tricky to understand.

Expected output:

>Name_with_nnn
---ATGC---ATTTG
>Name2_with_nnn
ATGGG----GGT---

So after I created my sequence files I tried to continue my script with:

while IFS= read -r line
do
     if [[ $line == ">"* ]]
     then
          echo "Ignoring header line: $line"
     else
          echo "Converting to uppercase and then N-to-gaps"
          # sed or tr?? do call $line or do I call $OUTFILE? so confused..
     fi
done

Answer 1

You may use this simple gnu sed :

sed '/^>/!{s/n/-/g; s/.*/\U&/;}' file

>Name_with_nnn
---ATGC---ATTTG
>Name2_with_nnn
ATGGG----GGT---

Answer 2

You can resolve this with sed with below line:

sed -i "/^>/! {s/n/-/g; s/\(.*\)/\U\1/g}" text.txt

And your output would be:

>Name_with_nnn
---ATGC---ATTTG
>Name2_with_nnn
ATGGG----GGT---

Answer 3

In pure Bash, likely quite slow for larger inputs:

while IFS= read -r line; do
    case $line in
        '>'*)
            printf '%s\n' "$line"
            ;;
        *)
            line=${line//n/-}
            printf '%s\n' "${line^^}"
            ;;
    esac
done < infile

This uses a case statement with pattern matching to test if a line starts with > or not; to modify the lines, parameter expansions are used. The ${ parameter ^^} expansion requires Bash 4.0 or newer.

Answer 4

How about awk ?

awk '/^[^>]/{gsub("n","-");print toupper($0);next;}1' data

Output:

>Name_with_nnn
---ATGC---ATTTG
>Name2_with_nnn
ATGGG----GGT---

However, sed can do it too (GNU sed):

sed -E '/^[^>]/{s/n/-/g;s/(.*)/\U\1/g;}' data

It's the same as:

sed -E '/^>/!{s/n/-/g;s/(.*)/\U\1/g;}' data

If you want to change in place, you can add -i switch to sed .

How to replace letters in lines in fasta file using bash loops?

Question

4 answers

solution1
2 2019-01-03 18:35:55

solution2
2 ACCPTED 2019-01-03 18:38:48

solution3
2 2019-01-03 18:58:08

solution4
0 2019-01-03 18:29:49

How to replace letters in lines in fasta file using bash loops?

Question

4 answers

solution1 2 2019-01-03 18:35:55

solution2 2 ACCPTED 2019-01-03 18:38:48

solution3 2 2019-01-03 18:58:08

solution4 0 2019-01-03 18:29:49

solution1
2 2019-01-03 18:35:55

solution2
2 ACCPTED 2019-01-03 18:38:48

solution3
2 2019-01-03 18:58:08

solution4
0 2019-01-03 18:29:49