简体   繁体   English

perl正则表达式

[英]perl regular expression

I am having some URLs in this format. 我有一些这种格式的URL。 Some URLs contain &abc=4 and some not. 有些网址包含&abc=4 ,有些则不含。

xxxxxxxxxxxxxxxxxxxxxxxxxxx&abc=4
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx&abc=4
xxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxx

here xxxxxxxxxxxxxxxxxxxxx is string 此处xxxxxxxxxxxxxxxxxxxxx是字符串

I want to match URLs which have xxxxxxxxxxxxxxxxx only and not &abc=4 (meaning I want to get these type of URLs, only xxxxxxxxxxxxxx , xxxxxxxxxxxxxx , xxx ) 我想匹配仅具有xxxxxxxxxxxxxxxxx而不具有&abc=4的URL(这意味着我想获得这些类型的URL,仅具有xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

I know how to write a regular expression which matches the entire url. 我知道如何编写与整个网址匹配的正则表达式。 For example: /x.*abc=4/ 例如: /x.*abc=4/

But how do I write a regular expression that matches only xxxxxxxxxx and not &abc=4 ? 但是,如何编写仅匹配xxxxxxxxxx 而不匹配&abc=4的正则表达式?

I would use negative look-ahead assertion (Look ahead what is not allowed to follow my pattern) 我将使用否定的前瞻性断言(请注意不允许遵循的模式)

^(?!.*&abc=4$).*$

This pattern will match any string that does not end with &abc=4 此模式将匹配任何以&abc=4结尾的字符串

you can verify it online here: http://www.rubular.com/ 您可以在此处在线验证: http//www.rubular.com/

Use negative lookbehind assertion . 断言后使用否定性后置 The form is: 形式是:

(?<![&?]abc=4)

(this will also exclude ?abc=4 ). (这还将排除?abc=4 )。

Assuming your URLs are on each line, you can use: 假设您的网址在每一行中,则可以使用:

([^&]+?)

This basically will match anything up to the the first instance of &. 这基本上将匹配&的第一个实例之前的所有内容。

As @Benoit said, you can do this using a zero width expression to negate the capture of the query string, but you would be after a positive lookahead, and not a negative lookbehind, syntax example below: 正如@Benoit所说,您可以使用零宽度表达式来取消对查询字符串的捕获,但是您将采用正向先行而不是负向后行的语法示例:

(?=(&[^=]+?\d+)+)

As you can see though, this would complicate the expression a touch. 如您所见,这会使表达式变得复杂。

Hope this helps. 希望这可以帮助。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM