正则表达式以匹配摩托车名称并分别提取所有字母和数字

Question

(\w{1,4})(?:\s{0,1})(\d{1,4})(?:\s{0,1})(\w{1,4})\s

Apologies if this is really ugly regex but I am not fluent in it at all. 抱歉，如果这确实是丑陋的正则表达式，但我一点也不流利。

I need a regex function to extract all possible combinations from motor cycle names for instance: 我需要一个正则表达式函数来从摩托车名称中提取所有可能的组合，例如：

From a Honda CBR500R I would need to get CBR, 500 and R. I am not sure if I regex could give me CBR500 and 500R as that would be really sweet! 从本田CBR500R我需要获得CBR，500和R。我不确定我的正则表达式是否可以给我CBR500和500R，因为那真是太好了！

Some type of bike names: 某些类型的自行车名称：

Honda CBR500R
CBR 500 R
CBR 500R
CBR500 R
GS1000 S
XYZT 1000P
500ztx
KLR250 Honda
FZR 600 Suzuki
SV650
Text here XXXX 9999 XXXX 9999 XXXXX more text here

Is there a way to improve my regex? 有没有办法改善我的正则表达式？ making it simpler and smarter? 使它更简单，更智能？

Answer 1

I come up with the following pattern. 我提出了以下模式。 No sure if it is what you expected (duplicates are not removed): 不知道这是否是您期望的（不删除重复项）：

import re

txt = """
Honda CBR500R
CBR 500 R
CBR 500R
CBR500 R
GS1000 S
XYZT 1000P
500ztx
KLR250 Honda
FZR 600 Suzuki
SV650
Text here XXXX 9999 XXXX 9999 XXXXX more text here
"""

pattern = r'[A-Z]+\d+|\d+[A-Z]|[A-Z]+(?![a-z])|\d+[a-z]+|\d+'
print re.findall(pattern, txt)

Output is: 输出为：

['CBR500', 'R', 'CBR', '500', 'R', 'CBR', '500R', 'CBR500', 'R', 'GS1000', 'S', 'XYZT', '1000P', '500ztx', 'KLR250', 'FZR', '600', 'SV650', 'XXXX', '9999', 'XXXX', '9999', 'XXXXX']

If you want to capture '500R' from 'CBR500R' also: 如果要从“ CBR500R”捕获“ 500R”，请执行以下操作：

p1 = r'[A-Z]+\d+|(?<!\d)[A-Z]+(?![a-z])|\d+[a-z]+|\d+(?![0-9A-Z])'
p2 = r'\d+[A-Z]'
print re.findall(p1, txt) + re.findall(p2, txt)

Output is: 输出为：

['CBR500', 'CBR', '500', 'R', 'CBR', 'CBR500', 'R', 'GS1000', 'S', 'XYZT', '500ztx', 'KLR250', 'FZR', '600', 'SV650', 'XXXX', '9999', 'XXXX', '9999', 'XXXXX', '500R', '500R', '1000P']

Answer 2

You can use 您可以使用

([A-Z]{2,})?[\s-]*(\d+)([a-z]+)?[\s-]*([A-Z]*\b)

See the regex demo 见正则表达式演示

The regex matches: 正则表达式匹配：

([AZ]{2,})? - Group 1: one or zero sequence of 2 or more capital ASCII letters -第1组：2个或多个大写ASCII字母的一个或零序列
[\\s-]* - zero or more - or whitespace symbols [\\s-]* -零个或多个-或空格符号
(\\d+) - Group 2: one or more digits (\\d+) -第2组：一个或多个数字
([az]+)? - Group 3: one or zero sequence of one or more ASCII lowercase letters -第3组：一个或多个ASCII小写字母的一或零序列
[\\s-]* - zero or more - or whitespace symbols [\\s-]* -零个或多个-或空格符号
([AZ]*\\b) - Group 4: zero or more ASCII uppercase letters followed by a word boundary. ([AZ]*\\b) -组4：零个或多个ASCII大写字母，后跟一个单词边界。

Here is a sample extraction code in Python : 这是Python中的示例提取代码：

import re
p = re.compile(r'([A-Z]{2,})?[\s-]*(\d+)([a-z]+)?[\s-]*([A-Z]*\b)')
test_str = "Honda CBR500R\nCBR 500 R\nCBR 500R\nCBR500 R\nGS1000 S\nXYZT 1000P\n500ztx\nKLR250 Honda\nFZR 600 Suzuki\nText here XXXX 9999 XXXX 9999 XXXXX more text here"
for s in p.findall(test_str):
    print("New Entry:")
    for r in s:
        if r:
            print(r)

Output: 输出：

New Entry:
CBR
500
R
New Entry:
CBR
500
R
New Entry:
CBR
500
R
New Entry:
CBR
500
R
New Entry:
GS
1000
S
New Entry:
XYZT
1000
P
New Entry:
500
ztx
New Entry:
KLR
250
New Entry:
FZR
600
New Entry:
XXXX
9999
XXXX
New Entry:
9999
XXXXX

正则表达式以匹配摩托车名称并分别提取所有字母和数字

问题描述

2 个解决方案

解决方案1
1 2016-02-23 04:01:39

解决方案2
1 已采纳 2016-02-23 13:45:44

正则表达式以匹配摩托车名称并分别提取所有字母和数字

问题描述

2 个解决方案

解决方案1 1 2016-02-23 04:01:39

解决方案2 1 已采纳 2016-02-23 13:45:44

解决方案1
1 2016-02-23 04:01:39

解决方案2
1 已采纳 2016-02-23 13:45:44