简体   繁体   English

将数值结果从单个文本文件隔离到Linux中的多个文件

[英]Segregation of numerical results from a single text file to multiple files in linux

I have a data like this 我有这样的数据

#start
#gatherData
*ELEMENT_SHELL
48709       1   50614   50616   50618   50613
48710       1   50613   50618   50608   50609
48711       1   50616   50617   50619   50618
48712       1   50618   50619   50607   50608
48715       1   50589   50590   50620   50615
48716       1   50615   50620   50616   50614
48717       1   50590   50591   50621   50620
48721       1   50623   50625   50626   50622
48722       1   50622   50626   50610   50611
48723       1   50625   50614   50613   50626
*END
$PresentData
$RESULT OF strength
48709  1.0267261e-002
48710  1.0721873e-002
48711  1.1930415e-002
48712  1.2186395e-002
48715  9.7443219e-003
48716  1.0036242e-002
48717  1.1186538e-002
48721  7.9333931e-003
48722  8.6850608e-003
48723  8.9872172e-003

What I want to do is to check first of all the results under $RESULT OF strength 我想做的是首先检查$ RESULT OF强度下的所有结果

which numbers in the second column lie between 0 and 1e-002, then based on that search the number between *ELEMENT_SHELL AND *END and send the complete line to new text file test1.txt. 第二列中的数字位于0到1e-002之间,然后根据该搜索在* ELEMENT_SHELL和* END之间的数字并将完整的行发送到新的文本文件test1.txt。 If the number is between 1e-002 to 1e-003 to the next text file test2.txt and segregate this single file into two different files. 如果数字在1e-002到1e-003之间,则指向下一个文本文件test2.txt,并将该单个文件分成两个不同的文件。 Text1.text would have Text1.text应该有

48709       1   50614   50616   50618   50613
48710       1   50613   50618   50608   50609
48711       1   50616   50617   50619   50618
48712       1   50618   50619   50607   50608
48716       1   50615   50620   50616   50614
48717       1   50590   50591   50621   50620

Text2.txt would have Text2.txt将具有

48721       1   50623   50625   50626   50622
48722       1   50622   50626   50610   50611
48723       1   50625   50614   50613   50626
48715       1   50589   50590   50620   50615

Can any expert suggest the way with SED, or AWk? 有专家可以建议使用SED或AWk吗? I think final results could be piped easily but the segregation from the same file and find it again is problematic. 我认为可以轻松地传递最终结果,但是从同一文件中分离并再次找到它是有问题的。 Thanks in advance 提前致谢

As a basic solution, consider the following code: 作为基本解决方案,请考虑以下代码:

[hamadhassan $] cat tri.awk
#!/usr/bin/gawk -f 

BEGIN{
    load_state=1; 
}


$0=="$RESULT OF strength"{
#    print "end of load state"
    load_state=0;
}

load_state==1 && NF==6{
#    print "storing "$0
    lut[$1]=$0; # store line in look up table:
}

load_state==0 && NF==2{
    if($2>0.0 && $2<1e-2){
    if($1 in lut){
        print lut[$1] > "Text2.txt";
    }
    }else{
    if($1 in lut){
        print lut[$1] > "Text1.txt";
    }
    }

}
[hamadhassan $]

which given your sample input: 给定您的示例输入:

[hamadhassan $] cat test.in
#start
#gatherData
*ELEMENT_SHELL
48709       1   50614   50616   50618   50613
48710       1   50613   50618   50608   50609
48711       1   50616   50617   50619   50618
48712       1   50618   50619   50607   50608
48715       1   50589   50590   50620   50615
48716       1   50615   50620   50616   50614
48717       1   50590   50591   50621   50620
48721       1   50623   50625   50626   50622
48722       1   50622   50626   50610   50611
48723       1   50625   50614   50613   50626
*END
$PresentData
$RESULT OF strength
48709  1.0267261e-002
48710  1.0721873e-002
48711  1.1930415e-002
48712  1.2186395e-002
48715  9.7443219e-003
48716  1.0036242e-002
48717  1.1186538e-002
48721  7.9333931e-003
48722  8.6850608e-003
48723  8.9872172e-003[hamadhassan $]

gives: 给出:

[hamadhassan $] ./tri.awk test.in
[hamadhassan $] cat Text2.txt
48715       1   50589   50590   50620   50615
48721       1   50623   50625   50626   50622
48722       1   50622   50626   50610   50611
48723       1   50625   50614   50613   50626
[hamadhassan $] cat Text1.txt
48709       1   50614   50616   50618   50613
48710       1   50613   50618   50608   50609
48711       1   50616   50617   50619   50618
48712       1   50618   50619   50607   50608
48716       1   50615   50620   50616   50614
48717       1   50590   50591   50621   50620
[hamadhassan $]

This was on CenTOS 6 with awk 3.1.7. 这是在Awk 3.1.7的CenTOS 6上进行的。

You can try with the following commands (assuming that the source file is txt.txt : 您可以尝试使用以下命令(假设源文件是txt.txt

grep "$RESULT OF strength" -A1000 txt.txt | awk '$2>0.01' | cut -f 1 | xargs -I{} grep {} txt.txt | egrep "[0-9]+[[:blank:]]+1[[:blank:]]+" > test1.txt


grep "$RESULT OF strength" -A1000 txt.txt | awk '$2<0.01' | cut -f 1 | xargs -I{} grep {} txt.txt | egrep "[0-9]+[[:blank:]]+1[[:blank:]]+" > test2.txt

If the columns are separated by spaces, then it would be: 如果列之间用空格隔开,则为:

grep "$RESULT OF strength" -A1000 txt.txt | sed 's/[\s]{2,}/\t/g' | awk '$2>0.01' | cut -f 1 -d' ' | xargs -I{} grep {} txt.txt | egrep "[0-9]+[[:blank:]]+1[[:blank:]]+" > test1.txt

grep "$RESULT OF strength" -A1000 txt.txt | sed 's/[\s]{2,}/\t/g' | awk '$2<0.01' | cut -f 1 -d' ' | xargs -I{} grep {} txt.txt | egrep "[0-9]+[[:blank:]]+1[[:blank:]]+" > test2.txt

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM