[英]Regex Lookahead and lookbehind multiple times in Python
I have the input formatted as below (txt1):我的输入格式如下(txt1):
txt1 = "[('1','Hello is 1)people 2)animals'), ('People are 1) hello 2) animals'), ('a')]"
I want to extract it in the following format-我想以以下格式提取它-
[['1','Hello is 1)people 2)animals'],['People are 1) hello 2) animals'],['a']]
So, basically, I want the information within the parentheses.所以,基本上,我想要括号内的信息。 But I haven't been able to do that.
但我无法做到这一点。 Also, I have used the Lookahead and Lookbehind to avoid splitting by the numbers- '1)' or '2)' which happened earlier when I went a simple statement of
re.split('[\(\)\[\]]
此外,我使用了 Lookahead 和 Lookbehind 来避免被数字拆分 - '1)' 或 '2)' 之前我使用
re.split('[\(\)\[\]]
的简单语句时发生的情况
I have been trying a findall function first to check what I am getting.我一直在尝试findall function 首先检查我得到了什么。
r = re.findall(r'\((?=\').*(?<=\')\)(?=\,)', txt1)
I have been getting-我一直在——
["('1','Hello is 1)people 2)animals'), ('People are 1) hello 2) animals')"]
It seems like it is ignoring the middle parenthesis.似乎它忽略了中间括号。 What can I do to get the result that I need?
我该怎么做才能得到我需要的结果?
Thank you.谢谢你。
Note:笔记:
For the split function, which I intend to use to get the desired output, I am getting this-对于拆分 function,我打算用它来获得所需的 output,我得到了这个 -
r = re.split(r'\((?=\').*(?<=\')\)(?=\,)', txt1)
['[', ", ('a')]"]
Why regex?为什么是正则表达式?
import ast
[list(x) if isinstance(x, tuple) else [x] for x in ast.literal_eval(txt1)]
# => [['1', 'Hello is 1)people 2)animals'], ['People are 1) hello 2) animals'], ['a']]
If you insist on regular expressions, this should work unless the strings contain escaped quotes:如果您坚持使用正则表达式,除非字符串包含转义引号,否则这应该有效:
[re.findall(r"'[^']*'", x) for x in re.findall(r"\(('[^']*'(?:,\s*'[^']*')*)\)", txt1)]
# => [["'1'", "'Hello is 1)people 2)animals'"], ["'People are 1) hello 2) animals'"], ["'a'"]]
Another solution without having to use regex
:无需使用
regex
的另一种解决方案:
txt1 = "[('1','Hello is 1)people 2)animals'), ('People are 1) hello 2) animals'), ('a')]"
replace_pairs = {
"('": "'",
"'), ": '#',
'[': '',
']': '',
"'": '',
}
for k, v in replace_pairs.items():
txt1 = txt1.replace(k, v)
txt1 = txt1[:-1].split('#') # the last char is a paranthesis
print([i.split(',') for i in txt1])
Output: Output:
[['1', 'Hello is 1)people 2)animals'], ['People are 1) hello 2) animals'], ['a']]
Note: This may not work if the input is more complicated than what you've shown here.注意:如果输入比您在此处显示的更复杂,这可能不起作用。
You could try with pattern \(([^(]+)\)
您可以尝试使用模式
\(([^(]+)\)
Explanation:解释:
\(
- match (
literally \(
- 匹配(
字面意思
(...)
- capturing group (...)
- 捕获组
[^(]+
- match one or more characters other from (
[^(]+
- 匹配除(
\)
- match )
literally \)
- 匹配)
字面意思
And use replace pattern: [\1]
, which puts first capturing group (backreference \1
) inside square brackets.并使用替换模式:
[\1]
,它将第一个捕获组(反向引用\1
)放在方括号内。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.