This is my first time posting on stack overflow, after being mostly searching for solutions and reading posts. I am trying to run a loop using bash so I can do the string search over a bunch of different files with the ext .u.clean I want to look through these files for the string "H#" or "h#" with the # being 1-28, and outputting to a file with the number that was searched for in the string. I am doing two separate searches in two fields ($5 and $0) and I wanted to output the total number of unique matches to a file "temp"#.txt. After this I want to do some math on the two numbers that are input in the file. So far I have gotten this far:
for i in {1..28}; do
awk -v var="$i" -F"\t" ' $19 ~ "_[hH]"var {print $0}' */*.u.clean | \
sort | uniq | wc -l > 'temp'$i'.txt' | \
awk -v var="$i" -F"\t" ' $19 ~ "_[hH]"var {print $5}' */*.u.clean | \
sort | uniq | wc -l >> 'chris'$i'.txt'
done
The problem is that the numbers are coming out wrong. I am getting a total of 28 "temp"#".txt" files, but the inputs are not the correct word count numbers. I also dont know how to do a mathematical operation one I have the files with the numbers in them. Can someone help me out or point me to the right direction? Thanks for any help.
EDIT:
Here is what some of the input might look like:
112 E 03 294168 FBLN7_rs335586251.5 GG
01/23/2013 2 3 VSD control 130123_CR_CH5_H26 1 A.Conservative17 D 11 294319 FBLN7_rs335586251.5 GG
06/26/2012 2 3 VSD control
120626_CR_CH5_H3 1 A.Conservative22 B 01 294703 FBLN7_rs335586251.5 GG
06/26/2012 2 2 VSD control
120626_CR_CH5_H4 1 A.Conservative103 A 07 295033 FBLN7_rs335586251.5 GG
01/23/2013 2 1 VSD control
130123_CR_CH5_H23 1 A.Conservative44 G 07 295119 Tbx5_rs61931008.5 GG
07/11/2012 2 5 ASD control
120711_CR_CH5_H12 1 A.Conservative42 H 12 295201 JAG1_rs1232607.5 GG
07/11/2012 1 2 ASD control
120711_CR_CH5_H12 1 A.Conservative
I am trying to find a count of how many times in field 19 ( the field with the text Tbx5_rs61931008.5.), each occurence of H'#' occurs with # being from 1-28, output that number to a separate file for each H#. Then I want to know withing these matches of H#, how many unique occasions of field 5 there are, and output that number to the same file for each H#. I hope this is clear, and let me know id it is not. Thanks.
This seems a bit complicated for what you are trying to achieve. I would suggest using find
and grep
find . -name "*.u.clean" -exec egrep -c '([Hh][1-9])|([Hh][1-2][0-9])'
You have to take the output and do the math
This assumes there is only one h#
per line in the file, if this is not correct then you will need to do a little more work. I would find all the files that have any occurrences and then use egrep -o '([Hh][1-9])|([Hh][1-2][0-9])' | wc -l
egrep -o '([Hh][1-9])|([Hh][1-2][0-9])' | wc -l
to get the total for each file.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.