简体   繁体   English

将正则表达式与无序的字母和数字字符串匹配

[英]Match regex with unordered string of alphabets and numbers

I have product names for which I have to find the model numbers. 我有要查找其型号的产品名称。 For example 例如

KIPOR KDE38SS3 DIESEL 400V AGGREGAATTI # Result --> KDE38SS3 
KIPOR KDE28SS3 DIESEL 400V AGGREGAATTI # Result --> KDE28SS3 
KIPOR KDE19STA3  19 KW GENERAATTORI 400V # Result --> KDE19STA3  
KRÄNZLE C895-1 KUUMAVESIPESURI KELALLA # Result --> C895-1
KRÄNZLE 1165-1 KUUMAVESIPESURI KELALLA # Result --> 1165-1
NILFISK MH 4M-200/960 FA KUUMAVESIPESURI # Result --> MH 4M-200/960 FA
WALLIUS LMP-452i MIG HITSAUSKONE # Result --> LMP-452i
KRÄNZLE C15/150 KUUMAVESIPESURI KELALLA # Result --> C15/150

My current code is simple and work in some cases but I want to get an efficient way. 我当前的代码很简单,并且在某些情况下可以工作,但是我想找到一种有效的方法。

for i in range (10):
    modelnum = re.findall(r'\w+\d+\w+', productnames[i])
    print(modelnum)

Results: 结果:

['KDE38SS3', '400V']
['KDE28SS3', '400V']
['KDE19STA3Â', '400V']
['C895']
['1165']
['200', '960']
['452i']
['C15', '150']

Is there a way I can only parse model no. 有没有办法我只能解析模型编号。 because in the results I am also getting 400V which is not a model no. 因为在结果中我还得到了400V,这不是型号。 and also one model no. 还有一个型号 is broken in two elements. 分为两个要素。

If you don't mind using a capturing group, and the model number is always the first match in the line, then you could do something like this: 如果您不介意使用捕获组,并且型号始终是该行中的第一个匹配项,则可以执行以下操作:

for i in range (10):
    modelnum = re.findall(r'^.*?(\w+\d+\w+)', productnames[i])
    print(modelnum)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM