I am trying to combine my understanding of dynamic regular expressions with awk's ability to print lines between two patterns in order to obtain lines between two patterns that could be bash variables. In this specific instance, the first pattern is a bash variable, and the other pattern is the following occurrence of a wildcard that begins with ">". The data looks something like:
CGCGCGCGCGCGCGCGCGCGCGCG
>jcf719000004955 0-783586
ACGTACGTACGTACGTACGTACGT
ACGTACGTACGTACGTACGTACGT
ACGTACGTACGTACGTACGTACGT
>jcf_anything 0-999999
TATATATATATATATATATATATA
TATATATATATATATATATATATA
And I would like to obtain just:
ACGTACGTACGTACGTACGTACGT
ACGTACGTACGTACGTACGTACGT
ACGTACGTACGTACGTACGTACGT
So, using these variables:
i="jcf719000004955"
data="/bin/file"
Neither of these matching patterns work:
awk '/^\>$i/{f=1;next} /^\>.*/{f=0} f' $data
awk '/^\>$i/{f=0} f; /^\>.*/{f=1}' $data
I'm able to use dynamic regular expressions to get the matching pattern containing my bash variable as such:
awk -v var="$i" '$0 ~ var ' $data | head -1
>jcf719000004955 0-783586
But how do I combine the use of dynamic regular expressions in order to obtain the lines in between two variables/patterns?
You can use the following gawk
command:
i=jcf719000004955; awk -v var="$i" '$0~"^>"var{f=1; next}/^[^>]/{if(f)print;next}/^>/{if(f)exit}' input.txt
input:
CGCGCGCGCGCGCGCGCGCGCGCG
>jcf719000004955 0-783586
ACGTACGTACGTACGTACGTACGT
ACGTACGTACGTACGTACGTACGT
ACGTACGTACGTACGTACGTACGT
>jcf_anything 0-999999
TATATATATATATATATATATATA
TATATATATATATATATATATATA
output:
ACGTACGTACGTACGTACGTACGT
ACGTACGTACGTACGTACGTACGT
ACGTACGTACGTACGTACGTACGT
explanations:
-v var="$i"
this is to pass a shell variable to your awk command in order to access it inside of your awk script. 0
in awk
the awk script:
# Rule(s)
$0 ~ ("^>"var) { #when the line starts with > and the value of your shell variabl
f = 1 #set f to 1
next #go to next line
}
/^[^>]/ { #when the line does not start with a >,
if (f) { #check if f is equal to 1
print $0 #if it is the case it prints the whole line on your stdrout
}
next # jump to next line
}
/^>/ { #if we reach this point, it means that the line starts with > but has another value that what is stored in your variable so we reset
if(f) { #if f was at 1 we have already passed by the printing section and we can exit
exit
}
}
test result:
你也可以尝试这个
awk -F'\n' -v RS='>' -v i="$i" '$1 ~ i {for(j=2;j<NF;j++) print $j}' infile
Following awk
could help you in same too.
i="jcf719000004955"
data="/bin/file"
awk -v val="$i" '/^>/{match($0,val);if(substr($0,RSTART,RLENGTH)){flag=1} else {flag=""};next} flag' "$data"
Output will be as follows.
ACGTACGTACGTACGTACGTACGT
ACGTACGTACGTACGTACGTACGT
ACGTACGTACGTACGTACGTACGT
Explanation: Adding explanation for above code too now.
i="jcf719000004955" ##Setting variable named i value as per OP mentioned.
data="yout_file" ##Setting value for variable named data to the Input_file for awk here in data shell variable.
awk -v val="$i" ' ##Setting variable named val for awk who has value of variable i in it. In awk we define variables by -v option.
/^>/{ ##Checking condition here if a line is starting from > then do following:
match($0,val); ##Using match function of awk where we are trying to match variable val in current line, if it is TRUE then 2 variables named RSTART and RLENGTH for math function will be having values. RSTAR will have the index of matching regex and RLENGTH will have complete length of that matched regex.
if(substr($0,RSTART,RLENGTH)){ ##Checking here if substring is NOT NULL which starts from RSTART to RLENGTH, if value is NOT NULL then do following:
flag=1 } ##Setting variable flag value to TRUE here.
else{ ##In case substring value is NULL then do following:
flag=""}; ##Setting variable flag value to NULL.
next ##next is awk out of the box keyword which will skip all further statements now.
}
flag ##Checking condition here if variable flag value is NOT NULL and NOT mentioning any action, so by default print of current line will happen.
' "$data" ##Mentioning the value of variable data with double quotes as this is having Input_file value which awk will read.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.