简体   繁体   中英

How to replace letters in lines in fasta file using bash loops?

I want to change all n in the sequence into - , but I don't know how to make my bash script not change the n that show up in sequence names. I'm not experienced with sed or regex to make sure my bash script reads only the lines that do not start with > , as that indicates the header.

Example file:

>Name_with_nnn
nnnatgcnnnatttg
>Name2_with_nnn
atgggnnnnGGtnnn

At the same time I want to convert all lowercase letters into uppercase, only in the sequence lines. I don't even know how to begin using sed, I find it really tricky to understand.

Expected output:

>Name_with_nnn
---ATGC---ATTTG
>Name2_with_nnn
ATGGG----GGT---

So after I created my sequence files I tried to continue my script with:

while IFS= read -r line
do
     if [[ $line == ">"* ]]
     then
          echo "Ignoring header line: $line"
     else
          echo "Converting to uppercase and then N-to-gaps"
          # sed or tr?? do call $line or do I call $OUTFILE? so confused..
     fi
done      

You may use this simple gnu sed :

sed '/^>/!{s/n/-/g; s/.*/\U&/;}' file

>Name_with_nnn
---ATGC---ATTTG
>Name2_with_nnn
ATGGG----GGT---

You can resolve this with sed with below line:

sed -i "/^>/! {s/n/-/g; s/\(.*\)/\U\1/g}" text.txt

And your output would be:

>Name_with_nnn
---ATGC---ATTTG
>Name2_with_nnn
ATGGG----GGT---

In pure Bash, likely quite slow for larger inputs:

while IFS= read -r line; do
    case $line in
        '>'*)
            printf '%s\n' "$line"
            ;;
        *)
            line=${line//n/-}
            printf '%s\n' "${line^^}"
            ;;
    esac
done < infile

This uses a case statement with pattern matching to test if a line starts with > or not; to modify the lines, parameter expansions are used. The ${ parameter ^^} expansion requires Bash 4.0 or newer.

How about awk ?

awk '/^[^>]/{gsub("n","-");print toupper($0);next;}1' data

Output:

>Name_with_nnn
---ATGC---ATTTG
>Name2_with_nnn
ATGGG----GGT---

However, sed can do it too (GNU sed):

sed -E '/^[^>]/{s/n/-/g;s/(.*)/\U\1/g;}' data

It's the same as:

sed -E '/^>/!{s/n/-/g;s/(.*)/\U\1/g;}' data

If you want to change in place, you can add -i switch to sed .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM