简体   繁体   中英

Extract an alphanumeric string between two special characters

I'm trying to match strings in the lines of a file and write the matches minus the first one and the last one

import os, re

infile=open("~/infile", "r")
out=open("~/out", "w")
pattern=re.compile("=[A-Z0-9]*>")
for line in infile:
    out.write( pattern.search(line)[1:-1] + '\n' )

Problem is that it says that Match is not subscriptable, when I try to add .group() it says that Nonegroup has no attritube group , groups() returns that .write needs a tuple etc

Any idea how to get .search to return a string ?

The re.search function returns a Match object.

If the match fails, the re.search function will return None. To extract the matching text, use the Match.group method.

>>> match = re.search("a.", "abc")
>>> if match is not None:
...     print(match.group(0))
'ab'
>>> print(re.search("a.", "a"))
None

That said, it's probably a better idea to use groups to find the required section of the match:

>>> match = re.search("=([A-Z0-9]*)>", "=abc>")  # Notice brackets
>>> match.group(0)
'=abc>'
>>> match.group(1)
'abc'

This regex can then be used with findall as @WiktorStribiżew suggests.

You seem to need only the part of strings between = and > . In this case, it is much easier to use a capturing group around the alphanumeric pattern and use it with re.findall that will never return None , but just an empty list upon no match, or a list of captured texts if found. Also, I doubt you need empty matches, so use + instead of * :

pattern=re.compile(r"=([A-Z0-9]+)>")
                      ^         ^

and then

"\n".join(pattern.findall(line))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM