简体   繁体   English

bash for awk的循环变量

[英]bash for loop variable with awk

This is my first time posting on stack overflow, after being mostly searching for solutions and reading posts. 这是我第一次在堆栈溢出中发布帖子,之后主要是寻找解决方案并阅读了帖子。 I am trying to run a loop using bash so I can do the string search over a bunch of different files with the ext .u.clean I want to look through these files for the string "H#" or "h#" with the # being 1-28, and outputting to a file with the number that was searched for in the string. 我正在尝试使用bash运行循环,所以我可以使用ext .u.clean对一堆不同的文件进行字符串搜索,我想通过这些文件在字符串中搜索字符串“ H#”或“ h#” #为1-28,并输出到具有在字符串中搜索的编号的文件。 I am doing two separate searches in two fields ($5 and $0) and I wanted to output the total number of unique matches to a file "temp"#.txt. 我在两个字段($ 5和$ 0)中进行两个单独的搜索,我想将唯一匹配的总数输出到文件“ temp”#。txt。 After this I want to do some math on the two numbers that are input in the file. 之后,我想对文件中输入的两个数字做一些数学运算。 So far I have gotten this far: 到目前为止,我已经做到了:

for i in {1..28}; do
    awk -v var="$i" -F"\t"  ' $19 ~ "_[hH]"var {print $0}' */*.u.clean | \
        sort | uniq | wc -l > 'temp'$i'.txt' | \
        awk -v var="$i" -F"\t"  ' $19 ~ "_[hH]"var {print $5}' */*.u.clean | \
        sort | uniq | wc -l >> 'chris'$i'.txt'
done

The problem is that the numbers are coming out wrong. 问题是数字错了。 I am getting a total of 28 "temp"#".txt" files, but the inputs are not the correct word count numbers. 我总共得到28个“ temp”#“。txt”文件,但是输入的字数不正确。 I also dont know how to do a mathematical operation one I have the files with the numbers in them. 我也不知道如何进行数学运算,因为我有文件,里面有数字。 Can someone help me out or point me to the right direction? 有人可以帮我或指出正确的方向吗? Thanks for any help. 谢谢你的帮助。

EDIT: 编辑:

Here is what some of the input might look like: 以下是一些输入内容:

112 E 03 294168 FBLN7_rs335586251.5 GG 112 E 03 294168 FBLN7_rs335586251.5 GG
01/23/2013 2 3 VSD control 130123_CR_CH5_H26 1 A.Conservative 2013年1月23日2 3 VSD控制130123_CR_CH5_H26 1 A.

17 D 11 294319 FBLN7_rs335586251.5 GG 17 D 11 294319 FBLN7_rs335586251.5 GG
06/26/2012 2 3 VSD control 06/26/2012 2 3 VSD控制
120626_CR_CH5_H3 1 A.Conservative 120626_CR_CH5_H3 1 A.保守

22 B 01 294703 FBLN7_rs335586251.5 GG 22 B 01 294703 FBLN7_rs335586251.5 GG
06/26/2012 2 2 VSD control 06/26/2012 2 2 VSD控制
120626_CR_CH5_H4 1 A.Conservative 120626_CR_CH5_H4 1 A.保守

103 A 07 295033 FBLN7_rs335586251.5 GG 103 A 07 295033 FBLN7_rs335586251.5 GG
01/23/2013 2 1 VSD control 2013年1月23日2 1 VSD控制
130123_CR_CH5_H23 1 A.Conservative 130123_CR_CH5_H23 1 A.保守

44 G 07 295119 Tbx5_rs61931008.5 GG 44 G 07 295119 Tbx5_rs61931008.5 GG
07/11/2012 2 5 ASD control 07/11/2012 2 5 ASD控制
120711_CR_CH5_H12 1 A.Conservative 120711_CR_CH5_H12 1 A.保守

42 H 12 295201 JAG1_rs1232607.5 GG 42高12 295201 JAG1_rs1232607.5 GG
07/11/2012 1 2 ASD control 07/11/2012 1 2 ASD控制
120711_CR_CH5_H12 1 A.Conservative 120711_CR_CH5_H12 1 A.保守

I am trying to find a count of how many times in field 19 ( the field with the text Tbx5_rs61931008.5.), each occurence of H'#' occurs with # being from 1-28, output that number to a separate file for each H#. 我正在尝试查找字段19(带有文本Tbx5_rs61931008.5。的字段)中有多少次计数,每次出现H'#'时发生的#是从1-28开始,将该数字输出到一个单独的文件中每个H#。 Then I want to know withing these matches of H#, how many unique occasions of field 5 there are, and output that number to the same file for each H#. 然后,我想知道与H#的这些匹配,字段5有多少个独特的情况,并将每个H#的编号输出到同一文件中。 I hope this is clear, and let me know id it is not. 我希望这很清楚,让我知道不是。 Thanks. 谢谢。

This seems a bit complicated for what you are trying to achieve. 对于您要实现的目标,这似乎有些复杂。 I would suggest using find and grep 我建议使用findgrep

find . -name "*.u.clean" -exec egrep -c '([Hh][1-9])|([Hh][1-2][0-9])'

You have to take the output and do the math 您必须获取输出并进行数学运算

This assumes there is only one h# per line in the file, if this is not correct then you will need to do a little more work. 假设文件中每行只有一个h# ,如果这不正确,则您需要做更多的工作。 I would find all the files that have any occurrences and then use egrep -o '([Hh][1-9])|([Hh][1-2][0-9])' | wc -l 我会找到所有出现的文件,然后使用egrep -o '([Hh][1-9])|([Hh][1-2][0-9])' | wc -l egrep -o '([Hh][1-9])|([Hh][1-2][0-9])' | wc -l to get the total for each file. egrep -o '([Hh][1-9])|([Hh][1-2][0-9])' | wc -l获取每个文件的总数。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM