简体   繁体   English

正则表达式匹配关键字之间的所有内容(包括新行)

[英]Regex to match everything (including new lines) between keywords

I am writing a vbscript file to parse data out of a log file. 我正在编写一个vbscript文件来解析日志文件中的数据。 Log file has this structure in it, always formatted this certain way: 日志文件中有这种结构,总是以某种方式格式化:

<name="ExecResponse" value="XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXX==" />

How can I just match the data in between the quotes (XXXXX), even with 0 or more new lines? 如何匹配引号(XXXXX)之间的数据,即使有0个或更多新行? Not language specific, but I am validating in Textpad, so not sure if global operators are available to me, but in VBScript they are. 不是语言特定的,但我在Textpad中验证,所以不确定我是否可以使用全局运算符,但是在VBScript中它们是可用的。

Thanks. 谢谢。

最简单的方法是使用/"[^"]*"/g ,假设所有引号都是正确平衡的,并且没有一个被转义。

VBScript solution, since you tagged your question : VBScript解决方案,因为你标记了你的问题

Set fso = CreateObject("Scripting.FileSystemObject")
txt = fso.OpenTextFile("C:\path\to\your.log").ReadAll

Set re = New RegExp
re.Pattern = """([^""]*)"""
re.Global = True

For Each m In re.Execute(txt)
  WScript.Echo m.SubMatches(0)
Next

Demonstration: 示范:

>>> s = "<name=""ExecResponse"" value=""XXXXXXXXXXXXXXXXXXXXXXX" & vbNewLine & _ "XXXXXXXXXXXXXXXXXXXXXXX" & vbNewLine & _ "XXXXXXXXXXXXXXXXXXXXXXX" & vbNewLine & _ "XXXXXXXXXXXXXXXXXXXXXXX" & vbNewLine & _ "XXXXXXXXXXXXX=="" />"
>>> WScript.Echo s
<name="ExecResponse" value="XXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXX==" />
>>> Set re = New RegExp
>>> re.Pattern = """([^""]*)"""
>>> re.Global = True
>>> For Each m In re.Execute(s) : WScript.Echo m.SubMatches(0) : Next
ExecResponse
XXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXX==

The actual regular expression is "([^"]*)" , but the double quotes must be doubled to escape them inside the string. 实际的正则表达式是"([^"]*)" ,但双引号必须加倍才能在字符串内转义它们。

If you want a more specific match (eg just the value of the value attribute), you need to make the regular expression more specific, eg like this: value="([^"]*)" . 如果你想要一个更具体的匹配(例如,只是value属性的value ),你需要使正则表达式更具体,例如像: value="([^"]*)"

Something like this: 像这样的东西:

value\="([^"]*)"

Or this if you want to allow possible spaces: 或者,如果您想允许可能的空格:

value[[:space:]]?\=[[:space:]]?"([^"]*)"

In theory, the word value followed by an escaped equals sign followed by a quote, followed by anything that's not a quote, followed by another quote. 从理论上讲,单词值后面是一个转义等号,后跟一个引号,后跟任何不是引号,然后是另一个引号。

I'm not familiar with VB script but the 'anything but a quote' part should also include new lines. 我不熟悉VB脚本,但“除引号之外的任何部分”部分也应包括新行。 Note in other languages there are switches to include new lines. 注意在其他语言中有开关包括新行。

Ex PHP uses the /s modifier for new lines: Ex PHP使用/ s修饰符表示新行:

<?php
preg_match('/value\="([^"]*)"/s',$string);
?>

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM