简体   繁体   中英

Why is Python Regex Wildcard only matching newLine

I am writing a program to parse through log messages using Python RegEx. I've gotten everything situated up until the message of the log. This could be any number of types of characters so I'm assuming the .* wildcard symbol would be the best solution for this problem. It matches everything except for a new line.

However, when I'm using the wildcard the only thing returning is the new line in this instance. Any ideas? Here's the code and the output:

import os
import re
#Change to and print correct file path
os.chdir('/Users/MacUser/Desktop/regExPython')
print(os.getcwd())

#Iterate and read from syslogexample.txt then print results
line_number = 0
with open('syslogexample.txt', 'r') as syslog:
    log_lines = syslog.readlines()
    for line in log_lines:
        line_number += 1
        print('{:>4} {}'.format(line_number, line.rstrip()))


#Build regex to parse through the data
DATE_RE = r'(\w{3}\s+\d{2})'
TIME_RE = r'(\d{2}:\d{2}:\d{2})'
DEVICE_RE = r'(\S+)'
PROCESS_RE = r'(\S+\s+\S+:)'
MESSAGE_RE = r'(.*)'
CD_RE = r'(\s+)'

Syslog_RE = DATE_RE + CD_RE + \
            TIME_RE + CD_RE + \
            DEVICE_RE + CD_RE + \
            PROCESS_RE + CD_RE + \
            MESSAGE_RE

#Use regex to parse through the data
for line in log_lines:
    m = re.match(Syslog_RE, line)
    if m:
        print(m.groups())

#Printed log Files
      1 apr 29 08:22:13 mac-users-macbook-8 syslogd[49]: asl sender statistics
   2 apr 29 08:22:17 mac-users-macbook-8 com.apple.xpc.launchd[1] (com.apple.xpc.launchd.domain.system):
   3 service "com.apple.emond.aslmanager" tried to hijack endpoint "com.apple.aslmanager" from owner:
   4 com.apple.aslmanager
   5 apr 29 08:22:17 mac-users-macbook-8 com.apple.xpc.launchd[1] (com.apple.xpc.launchd.domain.system):
   6 service "com.apple.emond.aslmanager" tried to hijack endpoint
   7 "com.apple.activity_tracing.cache-delete" from owner: com.apple.aslmanager
   8 apr 29 08:22:17 mac-users-macbook-8 com.apple.xpc.launchd[1] (com.apple.bsd.dirhelper[14184]):
   9 endpoint has been activated through legacy launch(3) apis. please switch to xpc or
  10 bootstrap_check_in(): com.apple.bsd.dirhelper
  11 apr 29 08:22:19 mac-users-macbook-8 com.apple.xpc.launchd[1]
  12 (com.apple.imfoundation.imremoteurlconnectionagent): unknown key for integer:
  13 _dirtyjetsammemorylimit

Parsed Log Files
('apr 29', ' ', '08:22:17', ' ', 'mac-users-macbook-8', ' ', 'com.apple.xpc.launchd[1] (com.apple.xpc.launchd.domain.system):', '\n', '')
('apr 29', ' ', '08:22:17', ' ', 'mac-users-macbook-8', ' ', 'com.apple.xpc.launchd[1] (com.apple.xpc.launchd.domain.system):', '\n', '')
('apr 29', ' ', '08:22:17', ' ', 'mac-users-macbook-8', ' ', 'com.apple.xpc.launchd[1] (com.apple.bsd.dirhelper[14184]):', '\n', '')

Process finished with exit code 0

As you can see at the end where MESSAGE_RE is the only printed characters are the \\n newline characters which I thought wouldn't be printing at all.

Thanks all!

in http://www.regex101.com the regex does not work correctly because .* only captures until newline character, meaning at the linebreak from ie line 3 to 4 it stops matching. maybe try re.compile() and compile the regex before re.match() . in python regex module there is the DOTALL flag that enables . to match newline characters as well http://docs.python.org/2/library/re.html

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM