简体   繁体   English

给定字符串的正则表达式

[英]Regex expression for a given string

I have a small issue i am running into.我遇到了一个小问题。 I need a regular expression that would split a passed string with numbers separately and anything chunk of characters within square brackets separately and regular set of string separately.我需要一个正则表达式,它可以将传递的字符串与数字分开,方括号内的任何字符块和常规字符串集分开。

for example if I have a strings that resembles例如,如果我有一个类似于

s = 2[abc]3[cd]ef 

i need a list with lst = ['2','abc','3','cd','ef']我需要一个带有lst = ['2','abc','3','cd','ef']

I have a code so far that has this..到目前为止,我有一个代码有这个..

import re
s = "2[abc]3[cd]ef"
s_final = ""
res = re.findall("(\d+)\[([^[\]]*)\]", s)
print(res)

This is outputting a list of tuples that looks like this.这是输出看起来像这样的元组列表。

[('2', 'abc'), ('3', 'cd')]

I am very new to regular expression and learning.. Sorry if this is an easy one.我对正则表达式和学习很陌生..对不起,如果这是一个简单的。

Thanks!谢谢!

The immediate fix is getting rid of the capturing groups and using alternation to match either digits or chars other than square bracket chars:直接的解决方法是摆脱捕获组并使用交替匹配方括号字符以外的数字或字符:

import re
s = "2[abc]3[cd]ef"
res = re.findall(r"\d+|[^][]+", s)
print(res)
# => ['2', 'abc', '3', 'cd', 'ef']

See the regex demo and the Python demo .请参阅正则表达式演示Python 演示 Details :详情

  • \\d+ - one or more digits \\d+ - 一位或多位数字
  • | - or - 或者
  • [^][]+ - one or more chars other than [ and ] [^][]+ - 除[]之外的一个或多个字符

Other solutions that might help are:其他可能有帮助的解决方案是:

re.findall(r'\w+', s)
re.findall(r'\d+|[^\W\d_]+', s)

where \\w+ matches one or more letters, digits, underscores and some more connector punctuation with diacritics and [^\\W\\d_]+ matches any one or more Unicode letters.其中\\w+匹配一个或多个字母、数字、下划线和更多带有变音符号的连接符标点符号,而[^\\W\\d_]+匹配任何一个或多个 Unicode 字母。

See this Python demo .请参阅此 Python 演示

Don't try a regex that will find all part in the string, but rather a regex that is able to match each block, and \\w (meaning [a-zA-Z0-9_] ) feats well不要尝试一个能找到字符串中所有部分的正则表达式,而是一个能够匹配每个块的正则表达式,并且\\w (意思是[a-zA-Z0-9_] )表现很好

s = "2[abc]3[cd]ef"
print(re.findall(r"\w+", s))  # ['2', 'abc', '3', 'cd', 'ef']

Or split on brackets或在括号上拆分

print(re.split(r"[\[\]]", s))  # ['2', 'abc', '3', 'cd', 'ef ']

Regex is intended to be used as a Regular Expression, your string is Irregular . Regex 旨在用作正则表达式,您的字符串是Irregular regex is being mostly used to find a specific pattern in a long text, text validation, extract things from text.正则表达式主要用于在长文本中查找特定模式、文本验证、从文本中提取内容。

for example, in order to find a phone number in a string, I would use RegEx, but when I want to build a calculator and I need to extract operators/digits I would not, but I would rather want to write a python code to do that.例如,为了在字符串中查找电话号码,我会使用 RegEx,但是当我想构建一个计算器并且需要提取运算符/数字时,我不会,但我宁愿写一个 python 代码来去做。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM