简体   繁体   English

perl查找并替换匹配字符串的一部分(正则表达式问题)

[英]perl find and replace a part of the matching string (regex issue)

Suppose I have a huge xml file that contains a bunch of information including email addresses. 假设我有一个巨大的xml文件,其中包含一堆信息,包括电子邮件地址。 So all email addresses will be something like the following: 因此,所有电子邮件地址都将如下所示:

user @gmail.com user @ gmail.com

The issue I'm running into deals with regular expressions. 我遇到的问题涉及正则表达式。 How do i match on the email address but only replace the user portion? 我如何匹配电子邮件地址但只替换用户部分? I tried using look ahead anchors with no luck, (it ends up replacing EVERYTHING before the @gmail.com) Is there a way to use look-ahead but only up to the white-space before user? 我尝试使用前瞻性的锚点没有运气,(它最终在@ gmail.com之前取代了一切)有没有办法使用前瞻但只能在用户之前使用白色空间? Or is there a simple solution to this? 或者有一个简单的解决方案吗? Right now I have something like the following: 现在我有以下内容:

perl 's/(?=@gmail.com)/replacement$&/ge' file.xml perl的/(?= @ gmail.com)/ replacement $&/ ge'file.xml

which doesn't work obviously. 这显然不起作用。 Any help is much appreciated! 任何帮助深表感谢!

使用除了空格和@之外的所有字符类:

s/[^\s@]+(?=@gmail\.com)/replacement/g

You could always just use the html5 email validator to get the user name. 您可以随时使用html5电子邮件验证程序获取用户名。
http://www.w3.org/TR/html5/forms.html#valid-e-mail-address http://www.w3.org/TR/html5/forms.html#valid-e-mail-address

$string =~ s/[a-zA-Z0-9.!#$%&'*+\/=?^_`{|}~-]+(@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*)/$1/g;  

Expanded: 扩展:

 [a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+ 
 (                                      # (1 start)
      @
      [a-zA-Z0-9] 
      (?:
           [a-zA-Z0-9-]{0,61} 
           [a-zA-Z0-9] 
      )?
      (?:
           \. 
           [a-zA-Z0-9] 
           (?:
                [a-zA-Z0-9-]{0,61} 
                [a-zA-Z0-9] 
           )?
      )*
 )                                      # (1 end)
s/ (\S+)@gmail\.com/replacement string/g;

I think this will resolve your problem for this scenario 我认为这将解决您的问题

<email>this is user@gmail.com</email>

This regex 这个正则表达式

s/([^>]+)@gmail\.com/replacement string/g

will resolve this scenario 将解决此方案

<email>user@gmail.com</email>

And this 和这个

s/([^"]+)@gmail\.com/replacement string/g

will resolve this 会解决这个问题

<person email="user@gmail.com"></person>

So combined, we have 结合起来,我们有

s/(\S+|[^>]+|[^"]+)@gmail\.com/replacement string/g

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM