简体   繁体   English

python regex查找/匹配一个或多个字符串

[英]python regex find/match one or more in a string

I almost can't see anymore for searching google and this site for solutions to my problem. 我几乎再也找不到搜索Google和此站点的解决方案。

I want to pick out one or more sequences of two different strings of text from a string: 我想从一个字符串中选择两个不同字符串的一个或多个序列:

eg 'aSATMPA23.37aSAAWAKE----aSABATT2.05-aSASLEEPING-' 例如'aSATMPA23.37aSAAWAKE----aSABATT2.05-aSASLEEPING-'

So I'd like to be able to pick out the 'aSATMPA23.37' and if it's there also the 'aSABATT2.05'. 因此,我希望能够选择“ aSATMPA23.37”,如果还有的话,还可以选择“ aSABATT2.05”。

I've tried the following: 我尝试了以下方法:

import re
serialdata = 'aSATMPA18.5-----aSBBATT2.97-aSBSLEEPING-'
def regex_serialdata(data):                                   
    GrandRegex = re.compile(r'(aS(.)(TMPA)(\d+\.\d+))|(aS(.)(BATT)(\d+\.\d+))')
    match = GrandRegex.match(data)

but this stops after only the first match of 'aSATMPA18.5' 但这仅在“ aSATMPA18.5”的第一场比赛后停止

Next I tried using 'findall' method: 接下来,我尝试使用“ findall”方法:

def regex_serialdata(data):                                   
    GrandRegex = re.compile(r'(aS(.)(TMPA)(\d+\.\d+))|(aS(.)(BATT)(\d+\.\d+))')      
    match = GrandRegex.findall(data)
    print(match)

Which resulted in: [('aSATMPA18.5', 'A', 'TMPA', '18.5', '', '', '', ''), ('', '', '', '', 'aSBBATT2.97', 'B', 'BATT', '2.97')] 结果为: [('aSATMPA18.5', 'A', 'TMPA', '18.5', '', '', '', ''), ('', '', '', '', 'aSBBATT2.97', 'B', 'BATT', '2.97')]

Is there a better way to do this? 有一个更好的方法吗?

Can I access the values within the list of tuples easily? 我可以轻松访问元组列表中的值吗?

Please note, I have spent hours on this and don't ask for help lightly. 请注意,我已经花了几个小时在此上,不要轻易寻求帮助。

Much appreciated, 非常感激,

Paul 保罗

>>> a = 'aSATMPA23.37aSAAWAKE----aSATMPA15.14-aSASLEEPING-'
>>> re.findall(r'aSATMPA\d+.\d+',a)
['aSATMPA23.37', 'aSATMPA15.14']

If You place the parenthesis like below, You can get a list of tuples with the values that You want from every match: 如果按如下所示放置括号,则可以从每个匹配项中获取具有所需值的元组列表:

>>> a
'aSATMPA23.37aSAAWAKE----aSBBATT2.05-aSASLEEPING-'
>>> b = re.findall(r'(aS)(ATMPA|BBATT)(\d+.\d+)',a)
>>> b
[('aS', 'ATMPA', '23.37'), ('aS', 'BBATT', '2.05')]
>>> b[0][0]
'aS'
>>> b[0][1]
'ATMPA'
>>> b[0][2]
'23.37'
>>> b[1][0]
'aS'
>>> b[1][1]
'BBATT'
>>> b[1][2]
'2.05'

Is there a better way to do this? 有一个更好的方法吗?

Yes. 是。 Get rid of all of your parentheses: 摆脱所有括号:

import re
serialdata = 'aSATMPA18.5-----aSBBATT2.97-aSBSLEEPING-'
def regex_serialdata(data):
    GrandRegex = re.compile(r'aS.TMPA\d+\.\d+|aS.BATT\d+\.\d+')
    match = GrandRegex.findall(data)
    print (match)

regex_serialdata(serialdata)

Can I access the values within the list of tuples easily? 我可以轻松访问元组列表中的值吗?

Yes. 是。 From your second example, try print(match[0][0], match[1][4]) . 从第二个示例,尝试print(match[0][0], match[1][4])

Try following regex: 尝试使用以下正则表达式:

r'(aSA(?:TMPA|BATT))(\d+(?:\.\d+)?)'

Full Code: 完整代码:

import re
p = re.compile(r'(aSA(?:TMPA|BATT))(\d+(?:\.\d+)?)', re.DOTALL)

test_str = """
aSATMPA23.37aSAAWAKE----aSABATT2.05-aSASLEEPING-aSATMPA23.37aSAAWAKE--
--aSABATT2.05-aSASLEEPING-aSATMPA23.37aSAAWAKE---
-aSABATT2.05-aSASLEEPING-aSATMPA23.37aSAAWAKE-
"""

for m in re.finditer(p, test_str):
    print('{0:<15}{1}'.format(m.group(1), m.group(2)))

It will print: 它将打印:

aSATMPA        23.37
aSABATT        2.05
aSATMPA        23.37
aSABATT        2.05
aSATMPA        23.37
aSABATT        2.05
aSATMPA        23.37

See demo 观看演示

Based on your input, it will capture 根据您的输入,它将捕获

  • aSATMPA23.37 aSATMPA23.37
  • aSABATT2.05 aSABATT2.05

Thanks to everyone who replied and contributed, with your help I've come up with the following: 感谢所有做出了贡献的人,在您的帮助下,我提出了以下建议:

import re

serialdata = 'aSATMPA18.5-----aSBBATT2.97-aSBSLEEPING-'

def regex_serialdata(data):                                  
    GrandRegex = re.compile(r'aS(.)(TMPA|BATT)(\d+.\d+)')

    match = GrandRegex.findall(data)

    print(match)
for x, y, z in match:   
    if y == 'TMPA':
        print('Temp is %s' % z)
    elif y == 'BATT':
        print('Battery is %sv' % z)

This produced the following output which is exactly what I want: 这产生了以下输出,正是我想要的:

[('A', 'TMPA', '18.5'), ('B', 'BATT', '2.97'), ('B', 'TMPA', '24.18')]
Temp is 18.5
Battery is 2.97v

I'm delighted, it even looks pretty :) 我很高兴,它甚至看起来很漂亮:)

Many thanks, 非常感谢,

Paul 保罗

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM