简体   繁体   English

Regex Expression可以获得双引号之间的所有内容

[英]Regex Expression to get everything between double quotes

I'm trying to get a regex to work for a string of multiline text. 我正在尝试使用正则表达式来处理一串多行文本。 Need this to work for python. 需要这个为python工作。

Example text: 示例文字:

description : "4.10 TCP Wrappers - not installed"
info        : "If some of the services running in /etc/inetd.conf are 

required, then it is recommended that TCP Wrappers are installed and configured to limit access to any active TCP and UDP services.

TCP Wrappers allow the administrator to control who has access to various inetd network services via source IP address controls. TCP Wrappers also provide logging information via syslog about both successful and unsuccessful connections.

TCP Wrappers are generally triggered via /etc/inetd.conf, but other options exist for \"wrappering\" non-inetd based software.

The configuration of TCP Wrappers to suit a particular environment is outside the scope of this benchmark; however the following links will provide the necessary documentation to plan an appropriate implementation:

ftp://ftp.porcupine.org/pub/security/index.html

The website contains source code for both IPv4 and IPv6 versions."

expect      : "^[\\s]*[A-Za-z0-9]+:[\\s]+[^A][^L][^L]"
required        : YES

I have come up with this, 我想出了这个,

[(a-zA-Z_ \t#)]*[:][ ]*\"[^\"]*.*\"

But the problem is that it stops at the second \\" the rest of the line is not selected. 但问题是它停在第二个\\“未选中该线的其余部分。

My objective is to get the entire string starting from info till the end of the double quotes, relating to the info line. 我的目标是让整个字符串从info开始直到双引号的末尾,与信息行相关。

This same regex should also work for the 'expect' line, starting from expect ending at the double quotes relating to the expect string. 同样的正则表达式也适用于'expect'行,从期望结束与期望字符串相关的双引号开始。

Once I get the entire string I will split it on the first ":" because I want to store these strings into a DB with the "description", "info", "expect" as columns then the strings as values in those columns. 一旦我得到整个字符串,我将把它拆分为第一个“:”因为我想将这些字符串存储到DB中,其中“description”,“info”,“expect”作为列,然后字符串作为这些列中的值。

Appreciate the help! 感谢帮助!

One alternative is to use thelexer provided in the shlex module: 一种替代方法是使用shlex模块中提供的shlex

>>> s = """tester : "this is a long string
that
is multiline, contains \\" double qoutes \\" and .
this line is finished\""""
>>> shlex.split(s[s.find('"'):])[0]
'this is a long string\nthat\nis multiline, contains " double qoutes " and .\nthis line is finished'

It will also remove the backslases from the double quotes inside the string. 它还将从字符串内的双引号中删除后退。

The code finds the first double quote in the string and only looks at the string starting from there. 代码在字符串中找到第一个双引号,只查看从那里开始的字符串。 It then uses shlex.split() to tokenize the remainder of the string, and takes the first token from the returned list. 然后使用shlex.split()来标记字符串的其余部分,并从返回的列表中获取第一个标记。

Update 1: I got this to work: 更新1:我让这个工作:

[(a-zA-Z_ \t#)]*[:][ ]*\"([^\"]|(?<=\\\\)[\"])*\"

Update 2: If you cannot modify the file to add escaped quotes where necessary for the expression above, then as long as the lines such as 更新2:如果你不能修改文件,在上面的表达式必要的地方添加转义引号,那么只要行如

group : "@GROUP@" || "test"

exist only as single lines, then I think this will grab those along with the longer quoted values: 只存在单行,那么我认为这将抓住那些与更长的引用值:

[(a-zA-Z_ \t#)]*[:][ ]*(?:\"([^\"]|(?<=\\\\)[\"])*\"|.*)(?=(?:\r\n|$))

Try that, and if it works, I'll update again to explain it. 试试吧,如果它有效,我会再次更新来解释它。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM