使用Python Regex在文件中提取子字符串

Question

A file has n number of lines in blocks of logically defined strings. 文件在逻辑定义的字符串块中具有n行。 I'm parsing each line and capturing the required data based on some matching conditions. 我正在解析每一行并根据一些匹配条件捕获所需的数据。

I have read through each line and finding the blocks with this code: 我已阅读每一行并使用以下代码查找块：

#python
    for lines in file.readlines():
        if re.match(r'block.+',lines)!= None:
            block_name = re.match(r'block.+', lines).group(0)
            # string matching code to be added here

Input File: 输入文件：

line1    select KT_TT=$TMTL/$SYSNAME.P1
line2    . $dhe/ISFUNC sprfl tm/tm1032 int 231
line3    select IT_TT=$TMTL/$SYSNAME.P2
line4    . $DHE/ISFUNC ptoic ca/ca256 tli 551
         .....
         .....


line89   CALLING IK02=$TMTL/$SYSNAME.P2
line90   CALLING KK01=$TMTL/$SYSNAME.P1

Matching conditions & expected output of each step: 每个步骤的匹配条件和预期输出：

While reading the lines, match the word "/ISFUNC" and fetch the characters from the last till it matches a "/" and save it to a variable. 在读取行时，匹配单词“/ ISFUNC”并从最后一个字符中取出字符，直到它与“/”匹配，并将其保存到变量中。 Expected o/p->tm1032 int 231, ca256 tli 551 (matching string found in line2 & line 4, etc) 预期o / p-> tm1032 int 231，ca256 tli 551（在第2行和第4行中找到匹配的字符串等）
Once ISFUNC is found, read the immediate previous line and fetch the data from that line, start form the last character till it matches a "/" and save it to a variable. 找到ISFUNC后，读取前一行并从该行获取数据，从最后一个字符开始，直到与“/”匹配，并将其保存到变量中。 Expected o/p->$SYSNAME.P1 & $SYSNAME.P2(line 1 & line 3, etc) 预计o / p - > $ SYSNAME.P1＆$ SYSNAME.P2（第1行和第3行等）
Continue reading the lines down and look for the line starting with "CALLING" and the last string after "/" should match with o/p of step 2($SYSNAME.P1 & $SYSNAME.P2). 继续读取行并查找以“CALLING”开头的行，“/”后的最后一个字符串应与第2步的o / p匹配（$ SYSNAME.P1和$ SYSNAME.P2）。 Just capture the data after CALLING word and save it. 只需在CALLING字后捕获数据并保存即可。 expected o/p -> KK01 (line 90) & IK02(line 89) 预期o / p - > KK01（第90行）和IK02（第89行）

final output should be like 最终输出应该是这样的

FUNC             SYS            CALL
tm1032 int 231   $SYSNAME.P1    KK01
ca256 tli 551    $SYSNAME.P2    IK02

Answer 1

If all you need is the text next to the last slash, you need not go for regex at all . 如果您只需要最后一个斜杠旁边的文本，则根本不需要使用正则表达式。

Simply use the .split("/") on each line and you can get the last part next to the slash 只需在每一行上使用.split("/")就可以得到斜杠旁边的最后一部分

sample = "$dhe/ISFUNC sprfl tm/tm1032 int 231"
sample.split("/")

will result in 会导致

['$dhe', 'ISFUNC sprfl tm', 'tm1032 int 231']

and then just access the last element of the list using -1 indexing to get the value 然后使用-1索引来访问列表的最后一个元素以获取值

PS : Use the split function once you have found the corresponding line PS：找到相应的行后使用拆分功能

Answer 2

While reading the lines, match the word "/ISFUNC" and fetch the characters from the last till it matches a "/" and save it to a variable. 在读取行时，匹配单词“/ ISFUNC”并从最后一个字符中取出字符，直到它与“/”匹配，并将其保存到变量中。 Expected o/p->tm1032 int 231 (matching string found in line2) 预期o / p-> tm1032 int 231（在第2行中找到匹配的字符串）

char_list = re.findall(r'/ISFUNC.*/(.*)$', line)
if char_list:
    chars = char_list[0]

Once ISFUNC is found, read the immediate previous line and fetch the data from that line, start form the last character till it matches a "/" and save it to a variable. 找到ISFUNC后，读取前一行并从该行获取数据，从最后一个字符开始，直到与“/”匹配，并将其保存到变量中。 Expected o/p->$SYSNAME.P1 (line 1) 预期o / p - > $ SYSNAME.P1（第1行）

The ideal approach here is to either (a) iterate through the list indices rather than the lines themselves (ie for i in range(len(file.readlines()): ... file.readlines()[i] ) or (b) maintain a copy of the last line (say, put last_line = line at the end of your for loop. Then, reference that last line for this expression: 这里理想的方法是（a）迭代列表索引而不是行本身（即for i in range(len(file.readlines()): ... file.readlines()[i] ）或（ b）维护最后一行的副本（比如，在for循环的末尾放置last_line = line 。然后，引用该表达式的最后一行：

data_list = re.findall(r'/([^/]*)$', last_line)
if data_list:
    data = data_list[0]

Continue reading the lines down and look for the line starting with "CALLING" and the last string after "/" should match with o/p of step 2($SYSNAME.P1). 继续读取这些行并查找以“CALLING”开头的行，“/”后的最后一个字符串应与步骤2的o / p匹配（$ SYSNAME.P1）。 Just capture the data after CALLING word and save it. 只需在CALLING字后捕获数据并保存即可。 expected o/p -> KK01 (line 90) 预期o / p - > KK01（第90行）

Assuming, from your example, you mean "just the data immediately after (ie up until the equals sign): 假设，从您的示例中，您的意思是“只是紧跟在之后的数据（即直到等号）：

calling_list = re.findall(r'CALLING(.*)=.*/' + re.escape(data) + '$', line) 
if calling_list:
    calling = calling_list[0]

You can move the parentheses around to change what from that line exactly you want to capture. 您可以移动括号以更改要从该行捕获的内容。 re.findall() will output a list of matches, including only the bits inside the parentheses that were matched. re.findall()将输出匹配列表，仅包括匹配的括号内的位。

使用Python Regex在文件中提取子字符串

问题描述

2 个解决方案

解决方案1
0 2019-06-10 13:18:33

解决方案2
0 2019-06-10 13:34:57

使用Python Regex在文件中提取子字符串

问题描述

2 个解决方案

解决方案1 0 2019-06-10 13:18:33

解决方案2 0 2019-06-10 13:34:57

解决方案1
0 2019-06-10 13:18:33

解决方案2
0 2019-06-10 13:34:57