简体   繁体   English

AWK:提取两个不同模式之间的字符串

[英]AWK: Extract string between two different patterns

I need to extract a string contained in a column of my csv. 我需要提取包含在我的csv列中的字符串。

My file is like this: 我的文件是这样的:

col1;col2;col3;cleavage=10-11;
col1;col2;col3;cleavage=1-2;
col1;col2;col3;cleavage=100-101;
col1;col2;col3;none;

So, the delimiter of my file is ";" 因此,我文件的定界符是“;” but in column 4 I want to extract the string between "cleavage=" and a "-". 但是在第4列中,我想提取“ cleavage =“和“-”之间的字符串。 What I did was to print the 2 chars after "cleavage=", but it's not always 2 chars. 我所做的是在“ cleavage =”之后打印2个字符,但并不总是2个字符。

I did it this way: 我这样做是这样的:

awk -F "\"*;\"*" '{if (match($4,"cleavage=")) print $1";"$2";"$3";"substr($4,RSTART+9,2); else print $1";"$2";"$3";0"}' file

I figured out that the following should be the correct command, but how should I integrate it in the previous one? 我发现以下命令应该是正确的命令,但是如何将其集成到上一个命令中呢?

awk "/Pattern1/,/Pattern2/ { print }" inputFile

Thanks for help! 感谢帮助! :) :)

EDIT: My actual output is 编辑:我的实际输出是

col1;col2;col3;10;
col1;col2;col3;1-;
col1;col2;col3;10;
col1;col2;col3;0;

But what I would like is: 但是我想要的是:

col1;col2;col3;10;
col1;col2;col3;1;
col1;col2;col3;100;
col1;col2;col3;0;

You can use this awk with multiple delimiters as field separator: 您可以将此awk与多个分隔符一起用作字段分隔符:

awk -F '[;=-]' -v OFS=';' '{print $1, $2, $3, ($4 == "cleavage") ? $5 : 0, ""}' file
col1;col2;col3;10;
col1;col2;col3;1;
col1;col2;col3;100;
col1;col2;col3;0;

EDIT: In case - or = can be present in fields before $4 then you can use: 编辑:如果-=可以出现在$4之前的字段中,则可以使用:

awk -F ';' -v OFS=';' '{split($4, a, /[=-]/);
           print $1, $2, $3, (a[1] == "cleavage") ? a[2] : 0, ""}' file
col1;col2;col3;10;
col1;col2;col3;1;
col1;col2;col3;100;
col1;col2;col3;0;

Unclear of the exact format but this works for your example and will work if = and - are in other fields. 不清楚确切的格式,但这适用于您的示例,如果=和-在其他字段中,则可以使用。

GNU awk (for match 3rd arg) GNU awk(用于第3个匹配项)

awk '{match($0,/(.*);[^-0-9]*([0-9]*)[^;]*;$/,a);print a[1]";"+a[2]";"}' file

col1;col2;col3;10;
col1;col2;col3;1;
col1;col2;col3;100;
col1;col2;col3;0;

or sed 或sed

sed 's/;[^-0-9]*\([0-9]\{1,\}\)[^;]*;$/;\1;/;t;s/[^;]*;$/0;/' file

I come up with this one liner: 我想出了这支班轮:

 awk -F';' -v OFS=";" '{sub(/cleavage=/,"",$(NF-1));
                        sub(/-.*/,"",$(NF-1));$(NF-1)+=0}7' file

it gives 它给

col1;col2;col3;10;
col1;col2;col3;1;
col1;col2;col3;100;
col1;col2;col3;0;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM