简体   繁体   English

Select 在 python 中使用正则表达式的文件中的特定字符集

[英]Select a specific set of characters in a file using regex in python

In my code I have views defined like below.在我的代码中,我定义了如下视图。

VIEW Company_Person_Sd IS
   Prompt = 'Company'
   Company.Prompt = 'Company ID'
SELECT company_id                          company_id,
       emp_no                              emp_no,
       Get_Person(company_id, emp_no)      person_id,
       cp.rowid                            objid,
       to_char(cp.rowversion)              objversion,
       rowkey                              objkey
FROM   companies cp;

There can be more than one view defined in a single file (usually there are 20 or more).在一个文件中可以定义多个视图(通常有 20 个或更多)。

I want to get the whole view using a regex in python.我想在 python 中使用正则表达式来获得整个视图。

I did the same thing to a method like below, using the following regex.(and it worked fine)我使用以下正则表达式对下面的方法做了同样的事情。(它工作正常)

methodRegex = r"^\s*((FUNCTION|PROCEDURE)\s+(\w+))(.*?)BEGIN(.*?)^END\s*(\w+);"

methodMatches = re.finditer(methodRegex, fContent, re.DOTALL | re.MULTILINE | re.IGNORECASE | re.VERBOSE)
        
        for methodMatchNum, methodMatch in enumerate(methodMatches, start=1):
            methodContent=methodMatch.group()
            methodNameFull=methodMatch.group(1)
            methodType=methodMatch.group(2)
            methodName=methodMatch.group(3)

method example方法示例

PROCEDURE Prepare___ (
   attr_ IN OUT VARCHAR2 )
IS
  ----
BEGIN
   --
END Prepare___;

PROCEDURE Insert___ (
   attr_ IN OUT VARCHAR2 )
IS
  ----
BEGIN
   --
END Insert___;

When I try to do the same for views, it gives the wrong output.当我尝试对视图执行相同操作时,它给出了错误的 output。 Actually I couldn't find how to catch the end of the view.实际上我找不到如何捕捉视图的尽头。 I tried with semicolon as well, which gave a wrong output.我也尝试使用分号,它给出了错误的 output。

My regex for views我的视图正则表达式

 viewRegex = r"^\s*(VIEW\s+(\w+))(.*?)SELECT(.*?)^FROM\s*(\w+);"

Please help me find out where I'm doing it wrong.请帮我找出我做错了什么。 Thanks in advance.提前致谢。

You don't get any match with viewRegex because it only matches when there are only word characters ( [a-zA-Z0-9_] ) between FROM and ;您没有与viewRegex匹配,因为它仅在FROM;之间只有单词字符( [a-zA-Z0-9_] )时匹配. . Whereas your example also includes a whitespace.而您的示例还包括一个空格。 So take whitespaces into account as well:所以也要考虑空格:

viewRegex = r"^\s*(VIEW\s+(\w+))(.*?)SELECT(.*?)^FROM\s*([\w\s]+);"

If you have a lot of views in a single file, another option is to prevent using .*?如果单个文件中有很多视图,另一种选择是阻止使用.*? with re.DOTALL to prevent unnecessary backtracking.使用re.DOTALL来防止不必要的回溯。

Instead, you can match the parts from VIEW to SELECT to FROM checking that what is in between is not another one of the key words to prevent matching too much using a negative lookahead (Assuming these can not occur in between)相反,您可以将VIEWSELECT的部件匹配到FROM检查中间的不是另一个关键字,以防止使用负前瞻匹配太多(假设这些不能发生在两者之间)

For the last part after FROM, you can match word characters, optionally repeated by whitespace chars and again word characters.对于 FROM 之后的最后一部分,您可以匹配单词字符,可选地由空格字符和单词字符重复。

^(VIEW\s+(\w+))(.*(?:\n(?!SELECT|VIEW|FROM).*)*)\nSELECT\s+(.*(?:\n(?!SELECT|VIEW|FROM).*)*)\nFROM\s+(\w+(?:\s+\w+));

The pattern matches:模式匹配:

  • ^ Start of string ^字符串开头
  • (VIEW\s+(\w+)) Capture group for VIEW followed by a group for the word characters (VIEW\s+(\w+))为 VIEW 捕获组,然后为单词字符组
  • (.*(?:\n(?.SELECT|VIEW|FROM).*)*) Capture group matching the rest of the lines, and all lines that do not start with a keyword (.*(?:\n(?.SELECT|VIEW|FROM).*)*)捕获组匹配rest的行,以及所有不以关键字开头的行
  • \nSELECT\s+ Match a newline, SELECT and 1+ whitespace cahrs \nSELECT\s+匹配换行符 SELECT 和 1+ 个空格
  • (.*(?:\n(?.SELECT|VIEW|FROM).*)*) Capture group matching the rest of the lines, and all lines that do not start with a keyword (.*(?:\n(?.SELECT|VIEW|FROM).*)*)捕获组匹配rest的行,以及所有不以关键字开头的行
  • \nFROM\s+ Match a newline, FROM and 1+whitespace chars \nFROM\s+匹配换行符、FROM 和 1+空格字符
  • (\w+(?:\s+\w+)) ; (\w+(?:\s+\w+)) ; Capture group for the value of FROM, matching 1+ word characters and optionally repeated by whitespace chars and word characters为 FROM 的值捕获组,匹配 1+ 个单词字符,并可选择由空格字符和单词字符重复

Regex demo正则表达式演示

For example (You can omit the re.VERBOSE and re.DOTALL )例如(您可以省略re.VERBOSEre.DOTALL

import re

methodRegex = r"^^(VIEW\s+(\w+))(.*(?:\n(?!SELECT|VIEW|FROM).*)*)\nSELECT\s+(.*(?:\n(?!SELECT|VIEW|FROM).*)*)\nFROM\s+(\w+(?:\s+\w+));"
fContent = ("VIEW Company_Person_Sd IS\n"
            "   Prompt = 'Company'\n"
            "   Company.Prompt = 'Company ID'\n"
            "SELECT company_id                          company_id,\n"
            "       emp_no                              emp_no,\n"
            "       Get_Person(company_id, emp_no)      person_id,\n"
            "       cp.rowid                            objid,\n"
            "       to_char(cp.rowversion)              objversion,\n"
            "       rowkey                              objkey\n"
            "FROM   companies cp;")
methodMatches = re.finditer(methodRegex, fContent, re.MULTILINE | re.IGNORECASE)

for methodMatchNum, methodMatch in enumerate(methodMatches, start=1):
    methodContent = methodMatch.group()
    methodNameFull = methodMatch.group(1)
    methodType = methodMatch.group(2)
    methodName = methodMatch.group(3)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM