简体   繁体   中英

Why doesn't the following regular expression work in Python?

I have the following code:

regularexpression = r'([-\w]*\w)? ?: ?([-"\#\w\s_]*\w?);'
outputfr = re.findall(regularexpression, inputdata, re.IGNORECASE)
return data

It's supposed to catch words, hyphens and other characters, ending in ";". So:

(hello-nine: hello, six, seven; hello-five: six eight) would output as [('hello-nine', 'hello, six, seven'), ('hello-five', 'six eight')

If final-number: "seventy", "sixty", "fifty", forty is part of the user input (inputdata), regularexpression doesn't catch it. I'd want it to output as [('final-number', '"seventy", "sixty", "fifty", "forty")]

Why is this?

In your regular expression, the second group:

([-"\#\w\s_]*\w?)

needs to be changed so that it will match commas:

([-"\#\w\s_,]*\w?)

Your example inputs -> outputs are not consistent. In the first case, the comma-separated items are kept together but in the second they are separate list elements. Also, do you want to strip parentheses? quote marks? Clarify by giving actual values for inputdata and showing what exactly you want to return (including stripping quote marks, parentheses). The data variable is never assigned.

Using .split(";") might be a better starting point...

inputdata = "(hello-nine: hello, six, seven; hello-five: six eight)"
mylist = inputdata.split(";")
# here either use regexp or another split, depending on what you want...
subset = [x.split(":") for x in mylist]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM