简体   繁体   English

将字符串末尾的正则表达式与 AWK 匹配

[英]Match regexp at the end of the string with AWK

I am trying to match two different Regexp to long strings with awk, removing the part of the string that matches in a 35 characters window.我正在尝试使用 awk 将两个不同的正则表达式与长字符串匹配,删除与 35 个字符 window 匹配的字符串部分。 The problem is that the same bunch of code works when I am looking for the first (which matches at the beginnng) whereas fails to match with the second one (end of string).问题是当我在寻找第一个(在开始时匹配)时,相同的一堆代码有效,而与第二个(字符串结尾)不匹配。 Input:输入:

Regexp1(1)(2)(3)(4)(5)xxxxxxxxxxxxxxx(20)(21)(22)(23)Regexp2

Desired output所需 output

(1)(2)(3)(4)(5)xxxxxxxxxxxxxxx(20)(21)(22)(23)

So far I used this code that extracts correctly Regexp1, but, unfortunately, is not able to extract also Regexp2 since indexed of RSTART and RLENGTH for Regexp2 are incorrect.到目前为止,我使用了正确提取 Regexp1 的代码,但不幸的是,由于 Regexp2 的 RSTART 和 RLENGTH 的索引不正确,因此也无法提取 Regexp2。 Code for extracting Regexp1 (correct output):提取 Regexp1 的代码(正确输出):

awk -v F="Regexp1" '{if (match(substr($1,1,35),F)) print   substr($1,RSTART,RLENGTH)}' file

Code for extracting Regexp2 (wrong output)提取 Regexp2 的代码(错误输出)

awk -v F="Regexp2" '{if (match(substr($1,length($1)-35,35),F)) print substr($1,RSTART,RLENGTH)}' file

Despite the indexes for Regexp1 are correct, for Regexp2 indexes are wrond (RSTART=13).尽管 Regexp1 的索引是正确的,但 Regexp2 的索引是错误的 (RSTART=13)。 I cannot figure out how to extract the second Regexp.我不知道如何提取第二个正则表达式。

Considering that your actual Input_file is same as shown samples, if this is the case could you please try following then(good to have new version of awk since old versions may not support number of times logic for regex).考虑到您的实际 Input_file 与显示的示例相同,如果是这种情况,请尝试遵循(很高兴拥有awk的新版本,因为旧版本可能不支持正则表达式的次数逻辑)。

awk '
match($0,/\([0-9]+\){5}.*\([0-9]\){4}/){
  print substr($0,RSTART,RLENGTH)
}' Input_file

In case your number of parenthesis values are not fixed then you could do like as follows:如果您的括号值的数量不固定,那么您可以执行以下操作:

awk '
match($0,/\([0-9]+\){1,}.*\([0-9]\){1,}/){
  print substr($0,RSTART,RLENGTH)
}' Input_file

If this isn't all you need:如果这不是您所需要的全部:

$ sed 's/Regexp1\(.*\)Regexp2/\1/' file
(1)(2)(3)(4)(5)xxxxxxxxxxxxxxx(20)(21)(22)(23)

or using GNU awk for gensub():或将 GNU awk 用于 gensub():

$ awk '{print gensub(/Regexp1(.*)Regexp2/,"\\1",1)}' file
(1)(2)(3)(4)(5)xxxxxxxxxxxxxxx(20)(21)(22)(23)

then edit your question to be far clearer with your requirements and example.然后编辑您的问题,使您的要求和示例更加清晰。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM