简体   繁体   English

正则表达式提取三重双引号和换行符之间

[英]Regex extract between trible double quotes and newlines

For example i want to parse python file with text between triple double quotes and make html table from this text. 例如,我想用三重双引号之间的文本解析python文件,并根据此文本创建html表。

Text block for example like that 像这样的文本块

"""
Replaces greater than operator ('>') with 'NOT BETWEEN 0 AND #'
Replaces equals operator ('=') with 'BETWEEN # AND #'

Tested against:
    * Microsoft SQL Server 2005
    * MySQL 4, 5.0 and 5.5
    * Oracle 10g
    * PostgreSQL 8.3, 8.4, 9.0

Requirement:
    * Microsoft Access

Notes:
    * Useful to bypass weak and bespoke web application firewalls that
      filter the greater than character
    * The BETWEEN clause is SQL standard. Hence, this tamper script
      should work against all (?) databases

>>> tamper('1 AND A > B--')
'1 AND A NOT BETWEEN 0 AND B--'
>>> tamper('1 AND A = B--')
'1 AND A BETWEEN B AND B--'
"""

Html table must be simple table contains 5 columns HTML表必须是简单表,包含5列

  1. Column everything between """ and \\n if new line is empty \\n if new line is empty"""\\n if new line is empty之间的所有列
  2. Column everything between Tested against: and \\n if new line is empty or Requirement: and \\n if new line is empty 在“ Tested against:\\n if new line is empty或“ Requirement:\\n if new line is empty
  3. Column everything between Notes: and \\n if new line is empty Notes:\\n if new line is empty之间插入所有内容, \\n if new line is empty
  4. Column everything between >>> and \\n >>>\\n之间的所有内容
  5. Column everything between 4 column end and \\n 4 column end\\n之间的所有4 column end

So result must be: 因此结果必须是:

  1. Replaces greater than operator ('>') with 'NOT BETWEEN 0 AND #' Replaces equals operator ('=') with 'BETWEEN # AND #' 将大于运算符('>')替换为'NOT BETWEEN 0 AND#'用'BETWEEN#AND#'替换等于运算符('=')
    • Microsoft SQL Server 2005 Microsoft SQL Server 2005

      • MySQL 4, 5.0 and 5.5 MySQL 4、5.0和5.5
      • Oracle 10g 甲骨文10g
      • PostgreSQL 8.3, 8.4, 9.0 PostgreSQL 8.3、8.4、9.0

      or 要么

      • Microsoft Access Microsoft Access
    • Useful to bypass weak and bespoke web application firewalls that filter the greater than character 绕过弱且定制的Web应用程序防火墙很有用,该防火墙过滤大于字符
    • The BETWEEN clause is SQL standard. BETWEEN子句是SQL标准。 Hence, this tamper script should work against all (?) databases 因此,此篡改脚本应适用于所有(?)数据库
  2. tamper('1 AND A > B--') tamper('1 AND A = B--') 篡改('1 AND A> B--')篡改('1 AND A = B--')

  3. '1 AND A NOT BETWEEN 0 AND B--' '1 AND A BETWEEN B AND B--' '1和A不在0和B之间-''1和A在B和B之间-'

What kind of syntax can i use to extract that? 我可以使用哪种语法来提取它? I will use VBScript.RegExp . 我将使用VBScript.RegExp。

Set fso = CreateObject("Scripting.FileSystemObject")
txt = fso.OpenTextFile("C:\path\to\your.py").ReadAll

Set re = New RegExp
re.Pattern = """([^""]*)"""
re.Global = True

For Each m In re.Execute(txt)
  WScript.Echo m.SubMatches(0)
Next

Your question is quite broad, so I'll just outline a way to deal with this. 您的问题涉及面很广,所以我将概述一种解决此问题的方法。 Otherwise I'd have to write the whole script for you, which isn't going to happen. 否则,我将不得不为您编写整个脚本,这将不会发生。

  1. Extract everything between the docquotes. 提取文档引号之间的所有内容。 Use a regular expression like this to extract the text between the docquotes: 使用这样的正则表达式提取docquotes之间的文本:

     Set re1 = New RegExp re1.Pattern = """""""([\\s\\S]*?)""""""" For Each m In re1.Execute(txt) docstr = m.SubMatches(0) Next 

    Note that you need to set the re.Global to True if you have more than 1 docstring in your file and want all of them processed. 请注意,如果文件中的文档字符串超过1个,并且希望所有字符串都经过处理,则需要将re.Global设置为True Otherwise you'll get just the first match. 否则,您只会得到第一场比赛。

  2. Remove leading and trailing whitespace with a second regular expression: 使用第二个正则表达式删除开头和结尾的空格:

     Set re2 = New RegExp re2.Pattern = "^\\s*|\\s*$" re2.Global = True 'find all matches docstr = re2.Replace(docstr, "") 

    You can't use Trim for this, because the function handles only spaces, not other whitespace. 您不能为此使用Trim ,因为该函数只能处理空格,不能处理其他空格。

  3. Either split the string at 2+ consecutive line breaks to get the doc sections, or use another regular expression to extract them: 在2个以上的连续换行符处分割字符串以获取doc部分,或使用另一个正则表达式提取它们:

     Set re3 = New RegExp re3.Pattern = "([\\s\\S]*?)\\r\\n\\r\\n" + "Tested against:\\r\\n([\\s\\S]*?)\\r\\n\\r\\n" + ... For Each m In re3.Execute(txt) descr = m.SubMatches(0) tested = m.SubMatches(1) ... Next 

Continue breaking down the sections until you have the elements you want to display. 继续细分各部分,直到拥有要显示的元素。 Then build the HTML from these elements. 然后从这些元素构建HTML。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM