简体   繁体   English

使用re.sub替换字符-保留一个字符

[英]Replace characters using re.sub - keep one character

I'm trying to repair broken email records in a table. 我正在尝试修复表格中损坏的电子邮件记录。 There are emails, for example: 'google@google.comyahoo@yahoo.com' but there can be a single email like 'google@google.com'. 有一些电子邮件,例如:“ google @ google.comyahoo @ yahoo.com”,但也可以有一个电子邮件,例如“ google@google.com”。 The best way to make this correct is in my opinion to use re.sub. 我认为最好的方法是使用re.sub。 But there is a little problem. 但是有一个小问题。 If there is a record: 如果有记录:

email = 'google@google.comyahoo@yahoo.com'

I can't simply do replace('.com','.com, ') because it affects both '.com' substrings. 我不能简单地执行replace('。com','。com,'),因为它会影响两个'.com'子字符串。 So I want to use re.sub('.com\\w', '.com, \\w',email) which replaces only those '.com' substrings, which aren't in the end of the record. 所以我想使用re.sub('.com\\w', '.com, \\w',email) ,它仅替换那些不在记录末尾的'.com'子字符串。 The problem is that I want to keep a \\w value there. 问题是我想在那里保留\\ w值。

print re.sub('.com\\w', '.com, \\w',email)

>>> google@google.com, \wahoo@yahoo.com

instead of 代替

>>> google@google.com, yahoo@yahoo.com

Can anybody give me an advice how to make it work? 有人可以给我建议如何使其工作吗? (I want to separate emails by comma and space) (我想用逗号和空格分隔电子邮件)

Use a capturing group and backreference the group inside of the replacement call: 使用捕获组并在替换调用中向后引用该组:

>>> import re
>>> email = 'google@google.comyahoo@yahoo.com'
>>> re.sub(r'\.com(\w)', '.com, \\1', email)
'google@google.com, yahoo@yahoo.com'

Backreferences recall what was matched by a capturing group . 后向引用回想起捕获小组所匹配的内容。 A backreference is specified as a backslash ( \\ ); 反引用被指定为反斜杠( \\ ); followed by a digit indicating the number of the group to be recalled. 后跟一个数字, 指示要调出的组的号码

x="google@google.comyahoo@yahoo.com"
print re.sub(r"(?<=\.com)(?=\w)",", ",x)

Output: google@google.com, yahoo@yahoo.com 输出: google@google.com, yahoo@yahoo.com

use lookarounds .See demo. 使用lookarounds参阅演示。

https://regex101.com/r/sJ9gM7/48 https://regex101.com/r/sJ9gM7/48

Lookarounds don't consume any of the string. 环顾四周不会占用任何字符串。 They are just assertions. 它们只是断言。 When you use them, you need not replace the consumed string back like the above answer does. 使用它们时,无需像上面的答案一样将消耗的字符串替换回去。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM