[英]re.search python regexpr
for my assignment I need to search through a program and print out how many pages each user has printed. 对于我的作业,我需要搜索一个程序并打印出每个用户已打印多少页。
date: 2012-11-25
printer on time: 0800
23.96.82.161 user: pei printer: core 2 pages: 2 code: r n t h p r
28.104.177.80 user: isaac printer: poster pages: 4 code: p h
printer error: out of paper time: 1343
180.186.109.129 user: luis printer: core 2 pages: 2 code: k n h
194.96.54.184 user: isaac printer: sally pages: 6 code: p k r p f
122.230.32.236 user: luis printer: hill 3 pages: 8 code: n h n k q
printer off time: 2201
is an example of what the program will contain, 是程序将包含的示例,
for stringprint in logfile:
userRegex = re.search('(\suser:\s)(.+?)(\sprinter:\s)', stringprint)
if userRegex:
userString = userRegex.group(2)
numpages = int(re.search('(\spages:\s)(.+?)(\scode:\s)', stringprint).group(2))
if userString not in users:
user[userString] = numpages
else:
user[userString] += numpages
my problem is the re.search isn't working properly, I believe the expression to be correct but it is clearly not. 我的问题是re.search不能正常运行,我相信这种表达是正确的,但显然不是。 I know that \\s
matches white spaces, also the .+?
我知道\\s
匹配空格, .+?
也匹配.+?
is the lazy version of matching the preceding token. 是与前面的令牌匹配的惰性版本。 once I find a match i use the the user.Regex.group(2)
to set it to the "username". 一旦找到匹配项,我就使用user.Regex.group(2)
将其设置为“用户名”。 from there I then want to search for the number of pages and code (to make sure correct match) and then proceed to print it. 然后,我要从那里搜索页面数和代码(以确保正确匹配),然后继续打印。 I know that this regex is not working but I can not figure out what I am doing wrong. 我知道此正则表达式无法正常工作,但我无法弄清楚自己在做什么错。
when I run the program via module i get : 当我通过模块运行程序时,我得到:
Traceback (most recent call last): File "C:\\Users\\brandon\\Desktop\\project3\\project3\\pages.py", line 45, in <module> log2hist("log") # version 2. File "C:\\Users\\brandon\\Desktop\\project3\\project3\\pages.py", line 29, in log2hist numpages = int(re.search('(\\spages:\\s)(.+?)(\\scode:\\s)',stringprint).group(2)) AttributeError: 'NoneType' object has no attribute 'group'
I recommend switching up your Regex so it'll be a bit more flexible. 我建议切换您的Regex,这样会更加灵活。 This regex will do the following: 此正则表达式将执行以下操作:
The Regex 正则表达式
^(?=.*?user:\s+(.*?)\s)(?=.*?pages:\s+(.*?)\s).*?$
Explained 解释
NODE EXPLANATION
----------------------------------------------------------------------
^ the beginning of a "line"
----------------------------------------------------------------------
(?= look ahead to see if there is:
----------------------------------------------------------------------
.*? any character except \n (0 or more times
(matching the least amount possible))
----------------------------------------------------------------------
user: 'user:'
----------------------------------------------------------------------
\s+ whitespace (\n, \r, \t, \f, and " ") (1
or more times (matching the most amount
possible))
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
.*? any character except \n (0 or more
times (matching the least amount
possible))
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------
\s whitespace (\n, \r, \t, \f, and " ")
----------------------------------------------------------------------
) end of look-ahead
----------------------------------------------------------------------
(?= look ahead to see if there is:
----------------------------------------------------------------------
.*? any character except \n (0 or more times
(matching the least amount possible))
----------------------------------------------------------------------
pages: 'pages:'
----------------------------------------------------------------------
\s+ whitespace (\n, \r, \t, \f, and " ") (1
or more times (matching the most amount
possible))
----------------------------------------------------------------------
( group and capture to \2:
----------------------------------------------------------------------
.*? any character except \n (0 or more
times (matching the least amount
possible))
----------------------------------------------------------------------
) end of \2
----------------------------------------------------------------------
\s whitespace (\n, \r, \t, \f, and " ")
----------------------------------------------------------------------
) end of look-ahead
----------------------------------------------------------------------
.*? any character except \n (0 or more times
(matching the least amount possible))
----------------------------------------------------------------------
$ before an optional \n, and the end of a
"line"
Online Demo of Regex 正则表达式在线演示
http://fiddle.re/13chna http://fiddle.re/13chna
Sample Python Code 样本Python代码
https://repl.it/CJdF/0 https://repl.it/CJdF/0
import re
SampleString = '''date: 2012-11-25
printer on time: 0800
23.96.82.161 user: pei printer: core 2 pages: 2 code: r n t h p r
28.104.177.80 user: isaac printer: poster pages: 4 code: p h
printer error: out of paper time: 1343
180.186.109.129 user: luis printer: core 2 pages: 2 code: k n h
194.96.54.184 user: isaac printer: sally pages: 6 code: p k r p f
122.230.32.236 user: luis printer: hill 3 pages: 8 code: n h n k q
printer off time: 2201'''
print (SampleString)
## Here re.findall()
Regex=re.compile(r'^(?=.*?user:\s+(.*?)\s)(?=.*?pages:\s+(.*?)\s).*?$',re.MULTILINE)
Matches = Regex.findall( SampleString)
Count = 0
for Match in Matches:
# do something with each found email string
print("[" + str(Count) + "][0] = " + Match[0])
print("[" + str(Count) + "][1] = " + Match[1])
print("")
Count = Count + 1
Sample Output 样本输出
[0][0] = pei
[0][1] = 2
[1][0] = isaac
[1][1] = 4
[2][0] = luis
[2][1] = 2
[3][0] = isaac
[3][1] = 6
[4][0] = luis
[4][1] = 8
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.