I have this file:
>first
GTGAAGTGCGGCACCCCGTAGGTCAGACAAGGCGGTCACGCCGCATCCGACATCCAACGCCCGAGCCGGTAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACC
>second
CGGTAAT
My expected output is this:
>first
GTGAAGTGCGGCACCCCGTAGGTCAGACAAGGCGGTCACGCCGCATCCGACATCCAACGC
CCCGAGCCGGTAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAACC
>second
CGGTAAT
Explanation: If (the line starts with '>') print; else if length of the string is greater than 60, split the string in substrings of 60.
My idea is something like this in awk, but also bash solutions are welcome:
gawk '/^>/ {print;next;} {len=length; if(len>60){DO SOMETHING HERE (LOOP?)} else {print}}'
Any help will be really appreciated! Thanks
You can use built in fold
utility in a BASH loop:
while read -r f; do
[[ "$f" == '>'* ]] && echo "$f" || echo "$f" | fold -w 60
done < file
Using awk
you can do:
$ awk '!/^>/&&length($0)%60{gsub(/.{60}/,"&\n")}1' file
>first
GTGAAGTGCGGCACCCCGTAGGTCAGACAAGGCGGTCACGCCGCATCCGACATCCAACGC
CCGAGCCGGTAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAACC
>second
CGGTAAT
Note: If you are using GNU awk
v3.x then add --re-interval
( awk --re-interval '..' file
). For GNU awk
v4 or later as well as BSD awk
it is not required.
What about this awk
?
awk -v FS=
'{for (i=0;i<=NF/60;i++) {
for (j=1;j<=60;j++)
printf "%s", $(i*60 +j)
print ""
}
}' file
See output:
$ awk -v FS= '{for (i=0;i<=NF/60;i++) {for (j=1;j<=60;j++) printf "%s", $(i*60 +j); print ""}}' file
>first
GTGAAGTGCGGCACCCCGTAGGTCAGACAAGGCGGTCACGCCGCATCCGACATCCAACGC
CCGAGCCGGTAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAACC
>second
CGGTAAT
You can make explicit the >
condition with:
awk -v FS= '/^>/ {print; next} {for (i=0;i<=NF/60;i++) {for (j=1;j<=60;j++) printf "%s", $(i*60 +j); print ""}}' file
-v FS=
set field separator to nothing, so that every single character will be a field. '/^>/ {print; next}
'/^>/ {print; next}
if the line starts with >
, print it and go to the next line. {for (i=0;i<=NF/60;i++) {for (j=1;j<=60;j++) printf "%s", $(i*60 +j); print ""}}
{for (i=0;i<=NF/60;i++) {for (j=1;j<=60;j++) printf "%s", $(i*60 +j); print ""}}
on the rest of the cases, loop in blocks of 60 characters, printing all of them and then a new line, until the end of line is reached. 避免完全分开行,仅手动进行子字符串打印。
awk -v FS='\n' '!/^>/ {for (i=0; i<(length($0)/60); i++) {print substr($0, i*60, 60)}; next}7'
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.