简体   繁体   中英

How to print matching strings in python with regex?

I am working on a Python script that would go through a directory with a bunch of files and extract the strings that match a certain pattern. More specifically, I'm trying to extract the values of serial number and a max-limit, and the lines look something like this:

#serial number = 642E0523D775

max-limit=50M/50M

I've got the script to go through the files, but I'm having an issue with it actually printing the values that I want it to. Instead of it printing the values, I just get the 'Nothing fount' output.

I'm thinking that it probably has something to do with the regex I'm using, but I can't for the life of me figure out how formulate this.

The script I've come up with so far:

import os
import re

#Where I'm searching

user_input = "/path/to/files/"
directory = os.listdir(user_input)

#What I'm looking for

searchstring = ['serial number', 'max-limit']
re_first = re.compile ('serial.\w.*')
re_second = re.compile ('max-limit=\w*.\w*')

#Regex combine
regex_list = [re_first, re_second]

#Looking

for fname in directory:
    if os.path.isfile(user_input + os.sep + fname):
        # Full path
        f = open(user_input + os.sep + fname, 'r')
        f_contents = f.read()
        content = fname + f_contents
        files = os.listdir(user_input)
        lines_seen = set()

        for f in files:
         print(f)
         if f not in lines_seen:  # not a duplicate

          for regex in regex_list:
              matches = re.findall(regex, content)

              if matches != None:
                for match in matches:
                  print(match)
              else:
                  print('Nema')
        f.close()

Per the documentation, the regex module's match() searches for "characters at the beginning of a string [that] match the regular expression pattern". Since you are prepending your file contents with the file name in the line:

content=fname + f_contents

and then match ing your pattern against the content in the line:

result=re.match(regex, content)

there will never be a match.

Since you want to locate a match anywhere in string, use search() instead.

See also:search() vs match()

Edit

The pattern ^[\\w&.\\-]+$ provided would match neither serial number = 642E0523D775 as it contains a space (" "), nor max-limit=50M/50M as it contains a forward slash ("/"). Both also contain an equals sign ("=") which cannot be matched by your pattern.

Additionally, the character class in this pattern matches the backslash (""), so you may want to remove it (the dash ("-") should not be escaped when it is at the end of the character class).

A pattern to match both these strings as well could be:

^[\\w&. \\/=\\-]+$

Try it out here

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM