[英]Find values using regex (includes brackets)
it's my first time with regex and I have some issues, which hopefully you will help me find answers.这是我第一次使用正则表达式,我遇到了一些问题,希望你能帮助我找到答案。 Let's give an example of data:
我们举一个数据的例子:
chartData.push({
date: newDate,
visits: 9710,
color: "#016b92",
description: "9710"
});
var newDate = new Date();
newDate.setFullYear(
2007,
10,
1 );
Want I want to retrieve is to get the date which is the last bracket and the corresponding description.我想要检索的是获取最后一个括号的日期和相应的描述。 I have no idea how to do it with one regex, thus I decided to split it into two.
我不知道如何使用一个正则表达式来做到这一点,因此我决定将它一分为二。
First part:第一部分:
I retrieve the value after the description:
.我在
description:
. This was managed with the following code: [\n\r].*description:\s*([^\n\r]*)
The output gives me the result with a quote "9710"
but I can fairly say that it's alright and no changes are required.这是使用以下代码管理的:
[\n\r].*description:\s*([^\n\r]*)
output 给我的结果是引用"9710"
,但我可以公平地说它是好的,无需更改。
Second part:第二部分:
Here it gets tricky.这里变得很棘手。 I want to retrieve the values in brackets after the text
newDate.setFullYear
.我想在文本
newDate.setFullYear
之后检索括号中的值。 Unfortunately, what I managed so far, is to only get values inside brackets.不幸的是,到目前为止我所做的只是获取括号内的值。 For that, I used the following code
\(([^)]*)\)
The result is that it picks all 3 brackets in the example:为此,我使用了以下代码
\(([^)]*)\)
结果是它选择了示例中的所有 3 个括号:
"{
date: newDate,
visits: 9710,
color: "#016b92",
description: "9710"
}",
"()",
"2007,
10,
1 "
What I am missing is an AND operator for REGEX with would allow me to construct a code allowing retrieval of data in brackets after the specific text .我缺少的是 REGEX 的 AND 运算符,它允许我构造一个代码,允许在特定文本之后检索括号中的数据。
I could, of course, pick every 3rd result but unfortunately, it doesn't work for the whole dataset.当然,我可以选择每 3 个结果,但不幸的是,它不适用于整个数据集。
Does anyone of you know the way how to resolve the second part issue?你们中有人知道如何解决第二部分问题吗?
Thanks in advance.提前致谢。
You can use the following expression:您可以使用以下表达式:
res = re.search(r'description: "([^"]+)".*newDate.setFullYear\((.*)\);', text, re.DOTALL)
This will return a regex match object with two groups, that you can fetch using:这将返回具有两个组的正则表达式匹配 object,您可以使用以下方法获取:
res.groups()
The result is then:结果是:
('9710', '\n2007,\n10,\n1 ')
You can of course parse these groups in any way you want.您当然可以以任何您想要的方式解析这些组。 For example:
例如:
date = res.groups()[1]
[s.strip() for s in date.split(",")]
==>
['2007', '10', '1']
The AND part that you are referring to is not really an operator.您所指的 AND 部分并不是真正的运算符。 The pattern matches characters from left to right, so after capturing the values in group 1 you cold match all that comes before you want to capture your values in group 2.
该模式从左到右匹配字符,因此在捕获第 1 组中的值之后,您可以冷匹配所有在您想要捕获第 2 组中的值之前出现的内容。
What you could do, is repeat matching all following lines that do not start with newDate.setFullYear(
您可以做的是重复匹配以下所有不以
newDate.setFullYear(
Then when you do encounter that value, match it and capture in group 2 matching all chars except parenthesis.然后,当您遇到该值时,匹配它并在第 2 组中捕获匹配除括号外的所有字符。
\r?\ndescription: "([^"]+)"(?:\r?\n(?!newDate\.setFullYear\().*)*\r?\nnewDate\.setFullYear\(([^()]+)\);
Regex demo |正则表达式演示| Python demo
Python 演示
Example code示例代码
import re
regex = r"\r?\ndescription: \"([^\"]+)\"(?:\r?\n(?!newDate\.setFullYear\().*)*\r?\nnewDate\.setFullYear\(([^()]+)\);"
test_str = ("chartData.push({\n"
"date: newDate,\n"
"visits: 9710,\n"
"color: \"#016b92\",\n"
"description: \"9710\"\n"
"});\n"
"var newDate = new Date();\n"
"newDate.setFullYear(\n"
"2007,\n"
"10,\n"
"1 );")
print (re.findall(regex, test_str))
Output Output
[('9710', '\n2007,\n10,\n1 ')]
There is another option to get group 1 and the separate digits in group 2 using the Python regex PyPi module还有另一个选项可以使用 Python正则表达式 PyPi 模块获取第 1 组和第 2 组中的单独数字
(?:\r?\ndescription: "([^"]+)"(?:\r?\n(?!newDate\.setFullYear\().*)*\r?\nnewDate\.setFullYear\(|\G)\r?\n(\d+),?(?=[^()]*\);)
import re
test = r"""
chartData.push({
date: 'newDate',
visits: 9710,
color: "#016b92",
description: "9710"
})
var newDate = new Date()
newDate.setFullYear(
2007,
10,
1);"""
m = re.search(r".*newDate\.setFullYear(\(\n.*\n.*\n.*\));", test, re.DOTALL)
print(m.group(1).rstrip("\n").replace("\n", "").replace(" ", ""))
The result:结果:
(2007,10,1)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.