[英]AWK: Extract string between two different patterns
I need to extract a string contained in a column of my csv. 我需要提取包含在我的csv列中的字符串。
My file is like this: 我的文件是这样的:
col1;col2;col3;cleavage=10-11;
col1;col2;col3;cleavage=1-2;
col1;col2;col3;cleavage=100-101;
col1;col2;col3;none;
So, the delimiter of my file is ";" 因此,我文件的定界符是“;” but in column 4 I want to extract the string between "cleavage=" and a "-". 但是在第4列中,我想提取“ cleavage =“和“-”之间的字符串。 What I did was to print the 2 chars after "cleavage=", but it's not always 2 chars. 我所做的是在“ cleavage =”之后打印2个字符,但并不总是2个字符。
I did it this way: 我这样做是这样的:
awk -F "\"*;\"*" '{if (match($4,"cleavage=")) print $1";"$2";"$3";"substr($4,RSTART+9,2); else print $1";"$2";"$3";0"}' file
I figured out that the following should be the correct command, but how should I integrate it in the previous one? 我发现以下命令应该是正确的命令,但是如何将其集成到上一个命令中呢?
awk "/Pattern1/,/Pattern2/ { print }" inputFile
Thanks for help! 感谢帮助! :) :)
EDIT: My actual output is 编辑:我的实际输出是
col1;col2;col3;10;
col1;col2;col3;1-;
col1;col2;col3;10;
col1;col2;col3;0;
But what I would like is: 但是我想要的是:
col1;col2;col3;10;
col1;col2;col3;1;
col1;col2;col3;100;
col1;col2;col3;0;
You can use this awk with multiple delimiters as field separator: 您可以将此awk与多个分隔符一起用作字段分隔符:
awk -F '[;=-]' -v OFS=';' '{print $1, $2, $3, ($4 == "cleavage") ? $5 : 0, ""}' file
col1;col2;col3;10;
col1;col2;col3;1;
col1;col2;col3;100;
col1;col2;col3;0;
EDIT: In case -
or =
can be present in fields before $4
then you can use: 编辑:如果-
或=
可以出现在$4
之前的字段中,则可以使用:
awk -F ';' -v OFS=';' '{split($4, a, /[=-]/);
print $1, $2, $3, (a[1] == "cleavage") ? a[2] : 0, ""}' file
col1;col2;col3;10;
col1;col2;col3;1;
col1;col2;col3;100;
col1;col2;col3;0;
Unclear of the exact format but this works for your example and will work if = and - are in other fields. 不清楚确切的格式,但这适用于您的示例,如果=和-在其他字段中,则可以使用。
GNU awk (for match 3rd arg) GNU awk(用于第3个匹配项)
awk '{match($0,/(.*);[^-0-9]*([0-9]*)[^;]*;$/,a);print a[1]";"+a[2]";"}' file
col1;col2;col3;10;
col1;col2;col3;1;
col1;col2;col3;100;
col1;col2;col3;0;
or sed 或sed
sed 's/;[^-0-9]*\([0-9]\{1,\}\)[^;]*;$/;\1;/;t;s/[^;]*;$/0;/' file
I come up with this one liner: 我想出了这支班轮:
awk -F';' -v OFS=";" '{sub(/cleavage=/,"",$(NF-1));
sub(/-.*/,"",$(NF-1));$(NF-1)+=0}7' file
it gives 它给
col1;col2;col3;10;
col1;col2;col3;1;
col1;col2;col3;100;
col1;col2;col3;0;
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.