简体   繁体   English

有人可以为Scala的apache访问日志文件计算正则表达式吗?

[英]Can someone compute a regular expression for apache access log files for Scala?

I am using the following regular expression in Scala 我在Scala中使用以下正则表达式

val Pattern = """^(\S+) (\S+) (\S+) \[([\w:/]+\s[+\-]\d{4})\] "(\S+) (\S+) (\S+)" (\d{3}) (\d+)""".r

val res = Pattern.findFirstMatchIn(logFile)

Yet it is giving me the following error: 但这给了我以下错误:

: Cannot parse log line: 80-219-148-207.dclient.hispeed.ch - - [07/Mar/2004:19:47:36 -0800] "OPTIONS * HTTP/1.0" 200 -

The issue is that your regex expected the last parameter to be numeric ( \\d+ - one or more digits), but it came as a - (unknown, undefined). 问题是您的正则表达式期望最后一个参数为数字( \\d+ -一个或多个数字),但是它以- (未知,未定义)出现。 The previous subpatterns worked OK because \\S+ (1 or more non-whitespaces) matches a hyphen. 以前的子模式工作正常,因为\\S+ (1个或多个非空白)与连字符匹配。

So, either replace the last \\d+ with \\S+ or use alternation (\\d+|-) . 因此,用\\S+替换最后一个\\d+或使用替代(\\d+|-) This latter approach can be extended to all the pattern parts like this: 后一种方法可以扩展到所有模式部分,如下所示:

^(\S+) (\S+) (\S+) \[([\w:/]+\s[+\-]\d{4})\] "(\S+) (\S+) (\S+)" (\d{3}|-) (\d+|-)
                                                                       ^^      ^^

See the regex demo . 参见regex演示

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM