简体   繁体   中英

re.search Multiple lines Python

re.search with \\s or '\\n' is not finding the multiline i'm trying to search for.

Portion of Source:

Date/Time:
2013-08-27 17:05:36 

----- BEGIN SEARCH -----

GENERAL DATA:
NAME:   AB12
SECTOR: 
999,999
CONTROLLED BY:  Player
ALLIANCE:   Aliance
ONLINE: 1 seconds ago
SIZE:   Large
HOMEWORLD:  NO
APPROVAL RATING:    100%
PRODUCTION RATE:    100%

RESOURCE DATA:
POWER:  0 / 0
BUILDINGS:  0 / 20
ORE:    80,000 / 80,000
CRYSTAL:    80,000 / 80,000
POPULATION: 40,000 / 40,000

BUILDING DATA:
N/A

UNIT DATA:
WYVERN(S):  100

----- END SEARCH -----

Looking at it in Notepad++ I see "BUILDING DATA:(LF)"

Full Code

lines = open('scan.txt','r').readlines()
for a in lines:
    if re.search(r"\A\d", a):
        digits = a
        if re.search(r"2013", digits):
            date.append(digits[:19])
            count +=1
        elif re.search(r",", digits):
            clean = digits.rstrip()
            sector = clean.split(',')
            x.append(sector[0])
            y.append(sector[1])
    elif re.search(r"CONTROLLED BY:", a):
        player.append(a[15:].rstrip())
    elif re.search(r"ALLIANCE:", a):
        alliance.append(a[10:].rstrip())
    elif re.search(r"SIZE:", a):
        size.append(a[6:].rstrip())
    elif re.findall('BUILDING DATA:\sN/A', a, re.M):
        def_grid = ''
        print "Didn't find it"
        defense.append(def_grid)
        defense_count +=1
    elif re.search(r"DEFENSE GRID", a):
        def_grid = a[16:].rstrip()
        print "defense found"
        defense_count +=1

But I am not having anything returned.

I need to put an empty spacer in when "DEFENSE GRID" doesn't exist after "BUILDING DATA:"

I know i'm missing something and I've tried reading up on re.search but i'm not able to find any thorough examples that explain how the multiline works.

re.findall("BUILDING DATA:\nN/A",a,re.MULTILINE)

You can do just what you did, but using re.findall instead of re.search :

re.findall('BUILDING DATA:\nN/A', a, re.M)
#['BUILDING DATA:\nN/A']

EDIT:

The problem is that you are currently reading line-by-line. In order to detect a pattern that belongs to two or more lines, you have to consider the string as a whole, maybe doing:

s = ''.join(lines)

which is ok if lines is not so big, and then use s to perform your multi-line searches...

I wonder why you have nothing returned. If your file looks like this:

BUILDING DATA:
N/A

I get using

import re
f = open('test.txt','r')
a = f.read(20)
re.search('BUILDING DATA:\nN/A', a, re.M)

an output. This is

<_sre.SRE_Match object at 0x1004fc8b8>

If I test re.search with string, that is not in the file like in this code:

import re
f = open('test.txt','r')
a = f.read(20)
re.search('BUILDING BATA:\nN/A', a, re.M)

there is no output as expected.

EDIT:

As Saullo Castro pointed out, the problem is the line-by-line reading. Why not use something like this?

a = open('scan.txt','r').read()
if re.findall('BUILDING DATA:\nN/A', a, re.M):
     print('found!')

3rd try:

tmp = False
...
elif re.findall('BUILDING DATA:', a, re.M):
    tmp = True
elif tmp and re.findall('N/A', a, re.M):
    def_grid = ''
    print "Didn't find it"
    defense.append(def_grid)
    defense_count +=1

Replace

re.findall('BUILDING DATA:\sN/A', a, re.M):

with

re.findall('BUILDING DATA:\nN/A', a, re.M):

or

re.search(r'BUILDING DATA:\nN/A', a, re.M):

and it should work.

(Notice that in your code there's \\s instead of \\n )

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM