[英]Regex extract between trible double quotes and newlines
For example i want to parse python file with text between triple double quotes and make html table from this text. 例如,我想用三重双引号之间的文本解析python文件,并根据此文本创建html表。
Text block for example like that 像这样的文本块
"""
Replaces greater than operator ('>') with 'NOT BETWEEN 0 AND #'
Replaces equals operator ('=') with 'BETWEEN # AND #'
Tested against:
* Microsoft SQL Server 2005
* MySQL 4, 5.0 and 5.5
* Oracle 10g
* PostgreSQL 8.3, 8.4, 9.0
Requirement:
* Microsoft Access
Notes:
* Useful to bypass weak and bespoke web application firewalls that
filter the greater than character
* The BETWEEN clause is SQL standard. Hence, this tamper script
should work against all (?) databases
>>> tamper('1 AND A > B--')
'1 AND A NOT BETWEEN 0 AND B--'
>>> tamper('1 AND A = B--')
'1 AND A BETWEEN B AND B--'
"""
Html table must be simple table contains 5 columns HTML表必须是简单表,包含5列
"""
and \\n if new line is empty
\\n if new line is empty
在"""
和\\n if new line is empty
之间的所有列 Tested against:
and \\n if new line is empty
or Requirement:
and \\n if new line is empty
在“ Tested against:
和\\n if new line is empty
或“ Requirement:
和\\n if new line is empty
Notes:
and \\n if new line is empty
在Notes:
和\\n if new line is empty
之间插入所有内容, \\n if new line is empty
>>>
and \\n
列>>>
和\\n
之间的所有内容 4 column end
and \\n
在4 column end
与\\n
之间的所有4 column end
So result must be: 因此结果必须是:
Microsoft SQL Server 2005 Microsoft SQL Server 2005
or 要么
tamper('1 AND A > B--') tamper('1 AND A = B--') 篡改('1 AND A> B--')篡改('1 AND A = B--')
'1 AND A NOT BETWEEN 0 AND B--' '1 AND A BETWEEN B AND B--' '1和A不在0和B之间-''1和A在B和B之间-'
What kind of syntax can i use to extract that? 我可以使用哪种语法来提取它? I will use VBScript.RegExp . 我将使用VBScript.RegExp。
Set fso = CreateObject("Scripting.FileSystemObject")
txt = fso.OpenTextFile("C:\path\to\your.py").ReadAll
Set re = New RegExp
re.Pattern = """([^""]*)"""
re.Global = True
For Each m In re.Execute(txt)
WScript.Echo m.SubMatches(0)
Next
Your question is quite broad, so I'll just outline a way to deal with this. 您的问题涉及面很广,所以我将概述一种解决此问题的方法。 Otherwise I'd have to write the whole script for you, which isn't going to happen. 否则,我将不得不为您编写整个脚本,这将不会发生。
Extract everything between the docquotes. 提取文档引号之间的所有内容。 Use a regular expression like this to extract the text between the docquotes: 使用这样的正则表达式提取docquotes之间的文本:
Set re1 = New RegExp re1.Pattern = """""""([\\s\\S]*?)""""""" For Each m In re1.Execute(txt) docstr = m.SubMatches(0) Next
Note that you need to set the re.Global
to True
if you have more than 1 docstring in your file and want all of them processed. 请注意,如果文件中的文档字符串超过1个,并且希望所有字符串都经过处理,则需要将re.Global
设置为True
。 Otherwise you'll get just the first match. 否则,您只会得到第一场比赛。
Remove leading and trailing whitespace with a second regular expression: 使用第二个正则表达式删除开头和结尾的空格:
Set re2 = New RegExp re2.Pattern = "^\\s*|\\s*$" re2.Global = True 'find all matches docstr = re2.Replace(docstr, "")
You can't use Trim
for this, because the function handles only spaces, not other whitespace. 您不能为此使用Trim
,因为该函数只能处理空格,不能处理其他空格。
Either split the string at 2+ consecutive line breaks to get the doc sections, or use another regular expression to extract them: 在2个以上的连续换行符处分割字符串以获取doc部分,或使用另一个正则表达式提取它们:
Set re3 = New RegExp re3.Pattern = "([\\s\\S]*?)\\r\\n\\r\\n" + "Tested against:\\r\\n([\\s\\S]*?)\\r\\n\\r\\n" + ... For Each m In re3.Execute(txt) descr = m.SubMatches(0) tested = m.SubMatches(1) ... Next
Continue breaking down the sections until you have the elements you want to display. 继续细分各部分,直到拥有要显示的元素。 Then build the HTML from these elements. 然后从这些元素构建HTML。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.