简体   繁体   中英

Python Regex to match a string as a pattern and return number

I have some lines that represent some data in a text file. They are all of the following format:

s = 'TheBears      SUCCESS Number of wins : 14'

They all begin with the name then whitespace and the text 'SUCCESS Number of wins : ' and finally the number of wins, n1. There are multiple strings each with a different name and value. I am trying to write a program that can parse any of these strings and return the name of the dataset and the numerical value at the end of the string. I am trying to use regular expressions to do this and I have come up with the following:

import re
def winnumbers(s):
    pattern = re.compile(r"""(?P<name>.*?)     #starting name
                             \s*SUCCESS        #whitespace and success
                             \s*Number\s*of\s*wins  #whitespace and strings
                             \s*\:\s*(?P<n1>.*?)""",re.VERBOSE)
    match = pattern.match(s)

    name = match.group("name")
    n1 = match.group("n1")

    return (name, n1)

So far, my program can return the name, but the trouble comes after that. They all have the text "SUCCESS Number of wins : " so my thinking was to find a way to match this text. But I realize that my method of matching an exact substring isn't correct right now. Is there any way to match a whole substring as part of the pattern? I have been reading quite a bit on regular expressions lately but haven't found anything like this. I'm still really new to programming and I appreciate any assistance.

Eventually, I will use float() to return n1 as a number, but I left that out because it doesn't properly find the number in the first place right now and would only return an error.

Try this one out:

((\S+)\s+SUCCESS Number of wins : (\d+))

These are the results:

>>> regex = re.compile("((\S+)\s+SUCCESS Number of wins : (\d+))")
>>> r = regex.search(string)
>>> r
<_sre.SRE_Match object at 0xc827cf478a56b350>
>>> regex.match(string)
<_sre.SRE_Match object at 0xc827cf478a56b228>

# List the groups found
>>> r.groups()
(u'TheBears SUCCESS Number of wins : 14', u'TheBears', u'14')

# List the named dictionary objects found
>>> r.groupdict()
{}

# Run findall
>>> regex.findall(string)
[(u'TheBears SUCCESS Number of wins : 14', u'TheBears', u'14')]
# So you can do this for the name and number:
>>> fullstring, name, number = r.groups()

If you don't need the full string just remove the surround parenthesis.

I believe that there is no actual need to use a regex here. So you can use the following code if it acceptable for you(note that i have posted it so you will have ability to have another one option):

dict((line[:line.lower().index('success')+1], line[line.lower().index('wins:') + 6:]) for line in text.split('\n') if 'success' in line.lower())

OR in case of you are sure that all words are splitted by single spaces:

output={}
for line in text:
    if 'success' in line.lower():
        words = line.strip().split(' ')
        output[words[0]] = words[-1]

If the text in the middle is always constant, there is no need for a regular expression. The inbuilt string processing functions will be more efficient and easier to develop, debug and maintain. In this case, you can just use the inbuilt split() function to get the pieces, and then clean the two pieces as appropriate:

>>> def winnumber(s):
...     parts = s.split('SUCCESS Number of wins : ')
...     return (parts[0].strip(), int(parts[1]))
... 
>>> winnumber('TheBears      SUCCESS Number of wins : 14')
('TheBears', 14)

Note that I have output the number of wins as an integer (as presumably this will always be a whole number), but you can easily substitute float() - or any other conversion function - for int() if you desire.

Edit : Obviously this will only work for single lines - if you call the function with several lines it will give you errors. To process an entire file, I'd use map() :

>>> map(winnumber, open(filename, 'r'))
[('TheBears', 14), ('OtherTeam', 6)]

Also, I'm not sure of your end use for this code, but you might find it easier to work with the outputs as a dictionary:

>>> dict(map(winnumber, open(filename, 'r')))
{'OtherTeam': 6, 'TheBears': 14}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM