简体   繁体   English

正则表达式Python在随机单词和特殊char前面添加一个char:

[英]Regex Python Adding a char before a random word and the special char :

I'm trying to find the correct regex lookaround to this type of string: 我正在尝试为这种类型的字符串找到正确的正则表达式:

cat: monkey, ab4 / 1997 / little: cat, 1954/ afgt22 /dog: monkey, 173 / pine-apple: duer, 129378s. / 12

The regex I'm trying to set is: 我想设置的正则表达式是:

Insert the char '|' 插入字符“ |” before any 'word' followed by ':', being 'word' any type of word with only chars and not numbers. 在任何“单词”后跟“:”之前,即为“单词”任何类型的单词,仅包含字符而不包含数字。

The issue: 问题:

I'm unable to find a way to consider beggining of strings, words containing '-' or words that are preceded of special chars, like '/' and not space, as in this example: 我无法找到一种方法来考虑字符串的开头,包含'-'的单词或特殊字符(如'/'而不是空格)开头的单词,如以下示例所示:

https://regex101.com/r/gX7wY0/5 https://regex101.com/r/gX7wY0/5

As you can see, only one of them worked so far, but the '|' 如您所见,到目前为止,只有其中一个有效,但是'|' char has a space after it, then the word followed by ':'. char后面有一个空格,然后是单词“:”。

What I'm trying to do is: 我想做的是:

|cat: monkey, ab4 / 1997 / |little: cat, 1954/ afgt22 /|dog: monkey, 173 / |pine-apple: duer, 129378s. / 12

So far only the special char '-' made part of a word before ':'. 到目前为止,只有特殊字符'-'在':'之前成为单词的一部分。

Thanks in advance, I'm still learning how to use regex with Python. 在此先感谢您,我仍在学习如何在Python中使用正则表达式。 Any tips are welcome! 欢迎任何提示!

You can use r'\\b' to search for word breaks. 您可以使用r'\\b'搜索分词。 For your case you are looking for 对于您的情况,您正在寻找

  • substrings that match: [A-Za-z\\-]+ 符合以下条件的子字符串: [A-Za-z\\-]+
  • and are surrounded by word breaks: \\b[A-Za-z\\-]+\\b 并被换行符包围: \\b[A-Za-z\\-]+\\b
  • and are followed by a colon: \\b[A-Za-z\\-]+\\b: 并后跟一个冒号: \\b[A-Za-z\\-]+\\b:
  • You can capture the word using parenthesis: \\b([A-Za-z\\-]+)\\b: 您可以使用括号捕获单词: \\b([A-Za-z\\-]+)\\b:
  • and recover it in the substitution using \\1 并使用\\1在替换中恢复它
import re

s = 'cat: monkey, ab4 / 1997 / little: cat, 1954/ afgt22 /dog: monkey, 173 / pine-apple: duer, 129378s. / 12'

re.sub(r'(\b[A-Za-z\-]+\b):', r'|\1:', s)
# returns:
'|cat: monkey, ab4 / 1997 / |little: cat, 1954/ afgt22 /|dog: monkey, 173 / |pine-apple: duer, 129378s. / 12'

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM