bash for loop variable with awk

Question

This is my first time posting on stack overflow, after being mostly searching for solutions and reading posts. I am trying to run a loop using bash so I can do the string search over a bunch of different files with the ext .u.clean I want to look through these files for the string "H#" or "h#" with the # being 1-28, and outputting to a file with the number that was searched for in the string. I am doing two separate searches in two fields ($5 and $0) and I wanted to output the total number of unique matches to a file "temp"#.txt. After this I want to do some math on the two numbers that are input in the file. So far I have gotten this far:

for i in {1..28}; do
    awk -v var="$i" -F"\t"  ' $19 ~ "_[hH]"var {print $0}' */*.u.clean | \
        sort | uniq | wc -l > 'temp'$i'.txt' | \
        awk -v var="$i" -F"\t"  ' $19 ~ "_[hH]"var {print $5}' */*.u.clean | \
        sort | uniq | wc -l >> 'chris'$i'.txt'
done

The problem is that the numbers are coming out wrong. I am getting a total of 28 "temp"#".txt" files, but the inputs are not the correct word count numbers. I also dont know how to do a mathematical operation one I have the files with the numbers in them. Can someone help me out or point me to the right direction? Thanks for any help.

EDIT:

Here is what some of the input might look like:

112 E 03 294168 FBLN7_rs335586251.5 GG
01/23/2013 2 3 VSD control 130123_CR_CH5_H26 1 A.Conservative

17 D 11 294319 FBLN7_rs335586251.5 GG
06/26/2012 2 3 VSD control
120626_CR_CH5_H3 1 A.Conservative

22 B 01 294703 FBLN7_rs335586251.5 GG
06/26/2012 2 2 VSD control
120626_CR_CH5_H4 1 A.Conservative

103 A 07 295033 FBLN7_rs335586251.5 GG
01/23/2013 2 1 VSD control
130123_CR_CH5_H23 1 A.Conservative

44 G 07 295119 Tbx5_rs61931008.5 GG
07/11/2012 2 5 ASD control
120711_CR_CH5_H12 1 A.Conservative

42 H 12 295201 JAG1_rs1232607.5 GG
07/11/2012 1 2 ASD control
120711_CR_CH5_H12 1 A.Conservative

I am trying to find a count of how many times in field 19 ( the field with the text Tbx5_rs61931008.5.), each occurence of H'#' occurs with # being from 1-28, output that number to a separate file for each H#. Then I want to know withing these matches of H#, how many unique occasions of field 5 there are, and output that number to the same file for each H#. I hope this is clear, and let me know id it is not. Thanks.

Answer 1

This seems a bit complicated for what you are trying to achieve. I would suggest using find and grep

find . -name "*.u.clean" -exec egrep -c '([Hh][1-9])|([Hh][1-2][0-9])'

You have to take the output and do the math

This assumes there is only one h# per line in the file, if this is not correct then you will need to do a little more work. I would find all the files that have any occurrences and then use egrep -o '([Hh][1-9])|([Hh][1-2][0-9])' | wc -l egrep -o '([Hh][1-9])|([Hh][1-2][0-9])' | wc -l to get the total for each file.

bash for loop variable with awk

Question

1 answers

solution1
1 2013-03-04 20:59:52

bash for loop variable with awk

Question

1 answers

solution1 1 2013-03-04 20:59:52

solution1
1 2013-03-04 20:59:52