简体   繁体   中英

AttributeError: 'NoneType' object has no attribute 'group' while using re.match

I need to compare the first element of two different files after a certain phrase. So far I have this:

import re

data1 = ""
data2 = ""
first = re.match(r".*Ignore until after this:(?P<data1>.*)", firstlist[0])
second = re.match(r".*Ignore until after this:(?P<data2>.*)", secondarray[0])
data1 = first.group('data1')
data2 = second.group('data2')

if data1 == data2:
  #rest of the code...

I want to ignore everything up to a certain point, and then save the rest into the variable. I do something almost identical to this earlier in the script and it works. However, when I run this, I get this error:

File "myfile.py", line [whatever line it is], in <module>  
data1 = first.group('data1')  
AttributeError: 'NoneType' object has no attribute 'group'

Why isn't re.match isn't working properly with first and second?

EDIT

As per suggestion, I've changed [\\s\\S]* to .* .

EDIT 2: This is what the input looks like (NOT like in the comment below):

Random text

More random text

Even more random text

Ignore until after this:

Meaningful text, keep this

...and everything else...

...until the end of the file here

That's really basically all it is: a string of text that needs to be saved from after a certain point

You're probably just having issues because of the newlines in your file. As Martijn Pieters pointed out in the comments to your questions, you can use the flag re.DOTALL to capture everything. So with a file like so, (named tmp in this example)

Random text

More random text

Even more random text

Ignore until after this:

Meaningful text, keep this

...and everything else...

...until the end of the file here

You could do something like this

with open('tmp') as f:
  first = re.match(r'.*Ignore until after this:(?P<data1>.*)', f.read(), re.DOTALL)
  print(first.group('data1'))

which gives

Meaningful text, keep this

...and everything else...

...until the end of the file here

The dot '.' character in regular expressions matches any character except a newline. So if you have your entire file as a single string, then the regular expression is matching up to the first new line, then trying to match your phrase against the start of the next line. When this fails, it returns a NoneType.

See this and this .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM