简体   繁体   English

使用正则表达式解析多行字符串

[英]Using regex to parse multiline string

This is the full string that I want to parse: 这是我要解析的完整字符串:

Response
--------
{
  Return Code: 1
  Key        : <None>
  Files      : [
    {
      Return Code: 0
      Data       : 'Value' is 1
'Value' is two
This is third line of output
    }
  ]
}

And this is how I want the parsed text to look like: 这就是我希望解析后的文本看起来像的样子:

'Value' is 1
'Value' is two
This is third line of output

I've tried messing with re.findall() but I cannot get exactly what I want. 我已经尝试过re.findall()但是我无法得到我想要的。
This is a python script which trys to parse using regex .. 这是一个python脚本,试图使用正则表达式进行解析。

import subprocess,re
output = subprocess.check_output(['staf', 'server.com', 'PROCESS', 'START', 'SHELL', 'COMMAND', "'uname'", 'WAIT', 'RETURNSTDOUT', 'STDERRTOSTDOUT'])
result = re.findall(r'Data\s+:\s+(.*)', output, re.DOTALL)[0]
print result

Output of script .. 脚本输出

[root@server ~]# python test.py 
''uname'' is not recognized as an internal or external command,
operable program or batch file.

    }
  ]
}

Option 1 选项1

If you want the three lines after Data: , you can do something like this, capturing the three lines into Group 1: 如果要在Data:之后添加三行,则可以执行以下操作,将三行捕获到组1中:

match = re.search(r"Data\s*:\s*((?:[^\n]*[\r\n]+){3})", subject)
if match:
    result = match.group(1)

Option 2 选项2

If you want all the lines after Data: before the first line that has a } , change the regex to : 如果要在Data:之后的所有行,在包含}的第一行之前,将正则表达式更改为:

Data\s*:\s*((?:[^\n]*(?:[\r\n]+(?!\s*}))?)+)

Using the following regex, you'll find the three strings you want. 使用以下正则表达式,您将找到所需的三个字符串。

Notice that this depends heavily on how the response is formatted. 请注意,这在很大程度上取决于响应的格式。

>>> import re
>>> response = """
Response
--------
{
  Return Code: 1
  Key        : <None>
  Files      : [
    {
      Return Code: 0
      Data       : 'Value' is 1
'Value' is two
This is third line of output
    }
  ]
}"""
>>> re.findall(r"('Value'.*)\n(.*)\n(.*)\n.*}",response)
[("'Value' is 1", "'Value' is two", 'This is third line of output')]

You could also include the newline characters in the groups like this: 您还可以在这样的组中包括换行符:

>>> re.findall(r"('Value'.*\n)(.*\n)(.*\n).*}",response)
[("'Value' is 1\n", "'Value' is two\n", 'This is third line of output\n')]

Depends on how you want to process this afterward. 取决于您以后如何处理。

UPDATE 更新

How about this? 这个怎么样?

>>> re.findall(r"Data\s*:\s*(.*?)}",response,re.DOTALL)
["'Value' is 1\n'Value' is two\nThis is third line of output\n    "]

This will find everything from the first 'Value' up untill the first '}'. 这将找到从第一个“值”到第一个“}”的所有内容。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM