[英]How would you find text in a string in python and then look for a number after it?
I have a log file and at the end of each line in the file there is this string: Line:#
where #
is the line number. 我有一个日志文件,文件的每一行的末尾都有以下字符串:
Line:#
其中#
是行号。
I am trying to get the # and compare it to the previous line's number. 我正在尝试获取#并将其与上一行的数字进行比较。 what would be the best way to do that in python?
在python中做到这一点的最佳方法是什么?
I would probably use str.split
because it seems easy: 我可能会使用
str.split
因为它看起来很简单:
with open('logfile.log') as fin:
numbers = [ int(line.split(':')[-1]) for line in fin ]
Now you can use zip
to compare one number with the next one: 现在,您可以使用
zip
将一个数字与下一个数字进行比较:
for num1,num2 in zip(numbers,numbers[1:]):
compare(num1,num2) #do comparison here.
Of course, this isn't lazy (you store every line number in the file at once when you really only need 2 at a time), so it might take up a lot of memory if your files are HUGE . 当然,这不是懒(您存储文件中的每一行号,一旦当你真的只需要2在同一时间),所以如果你的文件是巨大的 ,可能会占用大量的内存。 It wouldn't be hard to make it lazy though:
不过,让它变得懒惰并不难:
def elem_with_next(iterable):
ii = iter(iterable)
prev = next(ii)
for here in ii:
yield prev,here
prev = here
with open('logfile.log') as fin:
numbers = ( int(line.split(':')[-1]) for line in fin )
for num1,num2 in elem_with_next(numbers):
compare(num1,num2)
I'm assuming that you don't have something convenient to split a string on, meaning a regular expression might make more sense. 我假设您没有方便的拆分字符串的方法,这意味着正则表达式可能更有意义。 That is, if the lines in your log file are structured like:
也就是说,如果日志文件中的行结构如下:
date: 1-15-2013, error: mildly_annoying, line: 121
date: 1-16-2013, error: err_something_bad, line: 123
Then you won't be able to use line.split('#')
as mgilson as suggested, although if there is always a colon, line.split(':')
might work. 然后,您将无法按照建议的那样使用
line.split('#')
作为mgilson,尽管如果总是有一个冒号,则line.split(':')
可能会起作用。 In any case, a regular expression solution would look like: 无论如何,正则表达式解决方案如下所示:
import re
numbers = []
for line in log:
digit_match = re.search("(\d+)$", line)
if digit_match is not None:
numbers.append(int(digit_match.group(1)))
Here the expression "(\\d+)$"
is matching some number of digits and then the end of the line. 在这里,表达式
"(\\d+)$"
匹配一些数字,然后匹配该行的末尾。 We extract the digits with the group(1)
method on the returned match object and then add them to our list of line numbers. 我们使用
group(1)
方法在返回的匹配对象上提取数字,然后将其添加到行号列表中。
If you're not confident that the "Line: #" will always come at the end of the log, you could replace the regular expression used above with something akin to "Line:\\s*(\\d+)"
which checks for the string "Line:" then some (or no) whitespace, and then any number of digits. 如果您不确定“ Line:#”将始终出现在日志的末尾,则可以将上面使用的正则表达式替换为类似于
"Line:\\s*(\\d+)"
,以检查是否字符串“行:”,然后是一些(或没有)空格,然后是任意数量的数字。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.