重新搜索python regexpr

Question

for my assignment I need to search through a program and print out how many pages each user has printed. 对于我的作业，我需要搜索一个程序并打印出每个用户已打印多少页。

date: 2012-11-25
printer on time:  0800 
23.96.82.161 user: pei printer: core 2 pages: 2 code: r n t h p r
28.104.177.80 user: isaac printer: poster pages: 4 code: p h
printer error:  out of paper  time: 1343
180.186.109.129 user: luis printer: core 2 pages: 2 code: k n h
194.96.54.184 user: isaac printer: sally pages: 6 code: p k r p f
122.230.32.236 user: luis printer: hill 3 pages: 8 code: n h n k q
printer off time: 2201

is an example of what the program will contain, 是程序将包含的示例，

for stringprint in logfile:
        userRegex = re.search('(\suser:\s)(.+?)(\sprinter:\s)', stringprint)
        if userRegex:
            userString = userRegex.group(2)
            numpages = int(re.search('(\spages:\s)(.+?)(\scode:\s)', stringprint).group(2))

            if userString not in users:
                user[userString] = numpages
            else:
                user[userString] += numpages

my problem is the re.search isn't working properly, I believe the expression to be correct but it is clearly not. 我的问题是re.search不能正常运行，我相信这种表达是正确的，但显然不是。 I know that \\s matches white spaces, also the .+? 我知道\\s匹配空格， .+?也匹配.+? is the lazy version of matching the preceding token. 是与前面的令牌匹配的惰性版本。 once I find a match i use the the user.Regex.group(2) to set it to the "username". 一旦找到匹配项，我就使用user.Regex.group(2)将其设置为“用户名”。 from there I then want to search for the number of pages and code (to make sure correct match) and then proceed to print it. 然后，我要从那里搜索页面数和代码（以确保正确匹配），然后继续打印。 I know that this regex is not working but I can not figure out what I am doing wrong. 我知道此正则表达式无法正常工作，但我无法弄清楚自己在做什么错。

when I run the program via module i get : 当我通过模块运行程序时，我得到：

Traceback (most recent call last): File "C:\\Users\\brandon\\Desktop\\project3\\project3\\pages.py", line 45, in <module> log2hist("log") # version 2. File "C:\\Users\\brandon\\Desktop\\project3\\project3\\pages.py", line 29, in log2hist numpages = int(re.search('(\\spages:\\s)(.+?)(\\scode:\\s)',stringprint).group(2)) AttributeError: 'NoneType' object has no attribute 'group'

Answer 1

Description 描述

I recommend switching up your Regex so it'll be a bit more flexible. 我建议切换您的Regex，这样会更加灵活。 This regex will do the following: 此正则表达式将执行以下操作：

capture the username 捕获用户名
capture the number of prints 捕获打印数量
allow the user and pages to appear in any order. 允许用户和页面以任何顺序显示。 This becomes handy if you wanted to start capturing other data 如果您想开始捕获其他数据，这将很方便

The Regex 正则表达式

^(?=.*?user:\s+(.*?)\s)(?=.*?pages:\s+(.*?)\s).*?$

正则表达式可视化

Explained 解释

NODE                     EXPLANATION
----------------------------------------------------------------------
  ^                        the beginning of a "line"
----------------------------------------------------------------------
  (?=                      look ahead to see if there is:
----------------------------------------------------------------------
    .*?                      any character except \n (0 or more times
                             (matching the least amount possible))
----------------------------------------------------------------------
    user:                    'user:'
----------------------------------------------------------------------
    \s+                      whitespace (\n, \r, \t, \f, and " ") (1
                             or more times (matching the most amount
                             possible))
----------------------------------------------------------------------
    (                        group and capture to \1:
----------------------------------------------------------------------
      .*?                      any character except \n (0 or more
                               times (matching the least amount
                               possible))
----------------------------------------------------------------------
    )                        end of \1
----------------------------------------------------------------------
    \s                       whitespace (\n, \r, \t, \f, and " ")
----------------------------------------------------------------------
  )                        end of look-ahead
----------------------------------------------------------------------
  (?=                      look ahead to see if there is:
----------------------------------------------------------------------
    .*?                      any character except \n (0 or more times
                             (matching the least amount possible))
----------------------------------------------------------------------
    pages:                   'pages:'
----------------------------------------------------------------------
    \s+                      whitespace (\n, \r, \t, \f, and " ") (1
                             or more times (matching the most amount
                             possible))
----------------------------------------------------------------------
    (                        group and capture to \2:
----------------------------------------------------------------------
      .*?                      any character except \n (0 or more
                               times (matching the least amount
                               possible))
----------------------------------------------------------------------
    )                        end of \2
----------------------------------------------------------------------
    \s                       whitespace (\n, \r, \t, \f, and " ")
----------------------------------------------------------------------
  )                        end of look-ahead
----------------------------------------------------------------------
  .*?                      any character except \n (0 or more times
                           (matching the least amount possible))
----------------------------------------------------------------------
  $                        before an optional \n, and the end of a
                           "line"

Examples 例子

Online Demo of Regex 正则表达式在线演示

http://fiddle.re/13chna http://fiddle.re/13chna

Sample Python Code 样本Python代码

https://repl.it/CJdF/0 https://repl.it/CJdF/0

import re

SampleString = '''date: 2012-11-25
printer on time:  0800 
23.96.82.161 user: pei printer: core 2 pages: 2 code: r n t h p r
28.104.177.80 user: isaac printer: poster pages: 4 code: p h
printer error:  out of paper  time: 1343
180.186.109.129 user: luis printer: core 2 pages: 2 code: k n h
194.96.54.184 user: isaac printer: sally pages: 6 code: p k r p f
122.230.32.236 user: luis printer: hill 3 pages: 8 code: n h n k q
printer off time: 2201'''
print (SampleString)

## Here re.findall()
Regex=re.compile(r'^(?=.*?user:\s+(.*?)\s)(?=.*?pages:\s+(.*?)\s).*?$',re.MULTILINE)
Matches = Regex.findall( SampleString) 
Count = 0
for Match in Matches:
    # do something with each found email string
    print("[" + str(Count) + "][0] = " + Match[0])
    print("[" + str(Count) + "][1] = " + Match[1])
    print("")
    Count = Count + 1

Sample Output 样本输出

[0][0] = pei
[0][1] = 2

[1][0] = isaac
[1][1] = 4

[2][0] = luis
[2][1] = 2

[3][0] = isaac
[3][1] = 6

[4][0] = luis
[4][1] = 8

重新搜索python regexpr

问题描述

1 个解决方案

解决方案1
2 已采纳 2016-04-26 01:43:40

Description 描述

Examples 例子

重新搜索python regexpr

问题描述

1 个解决方案

解决方案1 2 已采纳 2016-04-26 01:43:40

Description 描述

Examples 例子

解决方案1
2 已采纳 2016-04-26 01:43:40