I want to change all n
in the sequence into -
, but I don't know how to make my bash script not change the n
that show up in sequence names. I'm not experienced with sed or regex to make sure my bash script reads only the lines that do not start with >
, as that indicates the header.
Example file:
>Name_with_nnn
nnnatgcnnnatttg
>Name2_with_nnn
atgggnnnnGGtnnn
At the same time I want to convert all lowercase letters into uppercase, only in the sequence lines. I don't even know how to begin using sed, I find it really tricky to understand.
Expected output:
>Name_with_nnn
---ATGC---ATTTG
>Name2_with_nnn
ATGGG----GGT---
So after I created my sequence files I tried to continue my script with:
while IFS= read -r line
do
if [[ $line == ">"* ]]
then
echo "Ignoring header line: $line"
else
echo "Converting to uppercase and then N-to-gaps"
# sed or tr?? do call $line or do I call $OUTFILE? so confused..
fi
done
You may use this simple gnu sed
:
sed '/^>/!{s/n/-/g; s/.*/\U&/;}' file
>Name_with_nnn
---ATGC---ATTTG
>Name2_with_nnn
ATGGG----GGT---
You can resolve this with sed
with below line:
sed -i "/^>/! {s/n/-/g; s/\(.*\)/\U\1/g}" text.txt
And your output would be:
>Name_with_nnn
---ATGC---ATTTG
>Name2_with_nnn
ATGGG----GGT---
In pure Bash, likely quite slow for larger inputs:
while IFS= read -r line; do
case $line in
'>'*)
printf '%s\n' "$line"
;;
*)
line=${line//n/-}
printf '%s\n' "${line^^}"
;;
esac
done < infile
This uses a case
statement with pattern matching to test if a line starts with >
or not; to modify the lines, parameter expansions are used. The ${ parameter ^^}
expansion requires Bash 4.0 or newer.
How about awk
?
awk '/^[^>]/{gsub("n","-");print toupper($0);next;}1' data
Output:
>Name_with_nnn
---ATGC---ATTTG
>Name2_with_nnn
ATGGG----GGT---
However, sed
can do it too (GNU sed):
sed -E '/^[^>]/{s/n/-/g;s/(.*)/\U\1/g;}' data
It's the same as:
sed -E '/^>/!{s/n/-/g;s/(.*)/\U\1/g;}' data
If you want to change in place, you can add -i
switch to sed
.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.