How can I write the script for multiple strings in a single line with no space:
acgtttgggcccagctctccgccctcacacacaccccggggt
for visual purpose:
acg ttt ggg ccc agc tct ccg ccc tca cac aca ccc cgg ggt
and will have to match the 4th 3 letter sequence repeated 2 times. so in the above sequence we have ccc as the 4th seq. and it is repeated again after agc tct ccg.
so would I have to use grep for it?
Then how about:
#!/bin/bash
# add a space every three letters
str="acgtttgggcccagctctccgccctcacacacaccccggggt"
result=$(sed -e 's/\(...\)/\1 /g' <<< "$str")
echo $result
# check if the 4th sequence is repeated two times
awk '
{ ref = $4; # set the 4th sequence as a reference
for (i=5; i<=NF; i++) # iterate from 5th sequence to the end
if (ref == $i) count++ # count the same one as the reference
printf "4th sequence \"%s\" repeated %d times.\n", ref, count
}' <<< "$result"
which yields:
acg ttt ggg ccc agc tct ccg ccc tca cac aca ccc cgg ggt
4th sequence "ccc" repeated 2 times.
The script is composed of two parts: 1st one to split the string with spaces, and the 2nd one to count the repetition of the 4th triplet.
sed
script sed -e 's/\\(...\\)/\\1 /g'
inserts a space after every three letters. awk
script loops over the sequences for the one which is same as the 4th triplet. count
with 2.Hope this helps.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.