简体   繁体   中英

Regex only matching one word

I am trying to match a number and all the words that come after the number in a line. The regular expression that I found to work was '(90[13]).*'. I am using Python with Sublime on Mac.

On regex101.com, it works and matches 901 and 903 and the words that come after it on each line. However, when I try that in my code editor, it only matches the numbers 901 and 903 but not the words after.

Here is the string

2008-11-08 06:32:46.354761500 26318 logging::logterse plugin: ` 89.223.216.72   apn-89-223-216-72.vodafone.hu   apn-89-223-216-72.vodafone.hu   <toshiter@donin.com>        rhsbl   901 Not supporting null originator (DSN)    msg denied before queued
2008-11-08 06:33:17.924158500 26331 logging::logterse plugin: ` 208.99.214.236  mx22.ecreditchoices7.com    mx22.ecreditchoices7.com    <moneydiet2@mx22.ecreditchoices7.com>       dnsbl   903 http://www.spamhaus.org/SBL/sbl.lasso?query=SBL69049    msg denied before queued
2008-11-08 06:34:53.318459500 26358 logging::logterse plugin: ` 84.58.57.150    dslb-084-058-057-150.pools.arcor-ip.net rpemgmu.arcor-ip.net    <sundered@ancientinc.com>       dnsbl   903 http://www.spamhaus.org/query/bl?ip=84.58.57.150    msg denied before queued
2008-11-08 06:35:41.724563500 26375 logging::logterse plugin: ` 58.126.113.198  Unknown [58.126.113.198]    <benny@surecom.com>     rhsbl   901 Not supporting null originator (DSN)    msg denied before queued
2008-11-08 06:37:31.730609500 26398 logging::logterse plugin: ` 87.103.146.91   pmsn.91.146.103.87.sable.dsl.krasnet.ru pmsn.91.146.103.87.sable.dsl.krasnet.ru <dwweem@wee.com>        dnsbl   903 http://www.spamhaus.org/query/bl?ip=87.103.146.91   msg denied before queued
2008-11-08 06:37:41.211401500 26409 logging::logterse plugin: ` 87.103.146.91   pmsn.91.146.103.87.sable.dsl.krasnet.ru pmsn.91.146.103.87.sable.dsl.krasnet.ru <dwtrupsm@trups.com>        dnsbl   903 http://www.spamhaus.org/query/bl?ip=87.103.146.91   msg denied before queued

Expression should match "901 Not supporting null originator (DSN) msg denied before queued" but only matches "901"

with open('data/email_log.txt', 'r') as fh:
    email_log = fh.read()

print(re.findall('(90[13]).*', email_log))

In python, using re.findall exhibits the following behavior:

If one or more groups are present in the pattern, return a list of groups;

Since .* (anything found after your numbers) is not contained in a group, it will not be in the final result. Either remove the groups altogether, or add a group for the text following your number:

s = """
2008-11-08 06:32:46.354761500 26318 logging::logterse plugin: ` 89.223.216.72   apn-89-223-216-72.vodafone.hu   apn-89-223-216-72.vodafone.hu   <toshiter@donin.com>        rhsbl   901 Not supporting null originator (DSN)    msg denied before queued
2008-11-08 06:33:17.924158500 26331 logging::logterse plugin: ` 208.99.214.236  mx22.ecreditchoices7.com    mx22.ecreditchoices7.com    <moneydiet2@mx22.ecreditchoices7.com>       dnsbl   903 http://www.spamhaus.org/SBL/sbl.lasso?query=SBL69049    msg denied before queued
2008-11-08 06:34:53.318459500 26358 logging::logterse plugin: ` 84.58.57.150    dslb-084-058-057-150.pools.arcor-ip.net rpemgmu.arcor-ip.net    <sundered@ancientinc.com>       dnsbl   903 http://www.spamhaus.org/query/bl?ip=84.58.57.150    msg denied before queued
2008-11-08 06:35:41.724563500 26375 logging::logterse plugin: ` 58.126.113.198  Unknown [58.126.113.198]    <benny@surecom.com>     rhsbl   901 Not supporting null originator (DSN)    msg denied before queued
2008-11-08 06:37:31.730609500 26398 logging::logterse plugin: ` 87.103.146.91   pmsn.91.146.103.87.sable.dsl.krasnet.ru pmsn.91.146.103.87.sable.dsl.krasnet.ru <dwweem@wee.com>        dnsbl   903 http://www.spamhaus.org/query/bl?ip=87.103.146.91   msg denied before queued
2008-11-08 06:37:41.211401500 26409 logging::logterse plugin: ` 87.103.146.91   pmsn.91.146.103.87.sable.dsl.krasnet.ru pmsn.91.146.103.87.sable.dsl.krasnet.ru <dwtrupsm@trups.com>        dnsbl   903 http://www.spamhaus.org/query/bl?ip=87.103.146.91   msg denied before queued
"""
import re
print(re.findall(r'(90[13])(.*)', s))

Output:

[('901', ' Not supporting null originator (DSN)    msg denied before queued'), ('903', ' http://www.spamhaus.org/SBL/sbl.lasso?query=SBL69049    msg denied before queued'), ('903', ' http://www.spamhaus.org/query/bl?ip=84.58.57.150    msg denied before queued'), ('901', ' Not supporting null originator (DSN)    msg denied before queued'), ('903', ' http://www.spamhaus.org/query/bl?ip=87.103.146.91   msg denied before queued'), ('903', ' http://www.spamhaus.org/query/bl?ip=87.103.146.91   msg denied before queued')]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM