简体   繁体   English

用匹配的正则表达式特殊字符分隔行

[英]Split a line with matching regex special characters

Currently I want to split a line with all the matching special characters of the regex. 目前,我想用正则表达式的所有匹配特殊字符来分割一行。 As it is hard to explain, here are a few examples: 很难解释,下面是一些示例:

('.+abcd[0-9]+\\.mp3', 'Aabcd09.mp3') -> [ 'A', '09' ] ('.+abcd[0-9]+\\.mp3', 'Aabcd09.mp3') -> [ 'A', '09' ]

  • .+ is a special expression of the regex and this is the match that I want .+是正则表达式的特殊表达,这是我想要的匹配项
  • [0-9]+ is another regex expression and I want what it matches too [0-9]+是另一个正则表达式,我也想要它匹配的内容

('.+\\..+_[0-9]+\\.mp3', 'A.abcd_09.mp3') -> [ 'A', 'abcd', '09' ] ('.+\\..+_[0-9]+\\.mp3', 'A.abcd_09.mp3') -> [ 'A', 'abcd', '09' ]

  • .+ is the first special expression of the regex, it matches A .+是正则表达式的第一个特殊表达式,它匹配A
  • .+ is the second special expression of the regex, it matches abcd .+是正则表达式的第二个特殊表达式,它匹配abcd
  • [0-9]+ is the third special expression of the regex, it matches 09 [0-9]+是正则表达式的第三个特殊表达式,它匹配09

Do you know how to achieve this? 你知道如何做到这一点吗? I didn't find anything. 我什么都没找到

Looks like you need a so called tokenizer/lexer to parse a regular expression first. 看起来您需要一个所谓的tokenizer / lexer首先解析一个正则表达式。 It will allow you to split a base regex on sub-expressions. 它将允许您在子表达式上拆分基本正则表达式。 Then just apply these sub-expressions to the original string and print out matches. 然后,将这些子表达式应用于原始字符串并打印出匹配项即可。

You can try this: 您可以尝试以下方法:

import re
s = ['Aabcd09.mp3', 'A.abcd_09.mp3']
new_s = [re.findall('(?<=^)[a-zA-Z]|(?<=\.)[a-zA-Z]+(?=_)|\d+(?=\.mp3)', i) for i in s]

Output: 输出:

[['A', '09'], ['A', 'abcd', '09']]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM