[英]how to skip the characters between 2 characters in a string using regex in python?
I have a multiline file where on of the lines is:我有一个多行文件,其中一行是:
node:milk1-01|name=milk1-01
So I need to parse this file to search for this line with a blueprint like:所以我需要解析这个文件以使用如下蓝图搜索这一行:
node:________|name=________
Tried to implement this in regex and got confused.试图在正则表达式中实现它并感到困惑。 Used the below snippet within a loop of reading everyline from the file.
在从文件中读取每一行的循环中使用了以下代码段。
x = re.findall('node:'+'\w+[-]*\d*'+'\\|name='+'\w+-\d*', line)
print(x)
Very new to this concept.这个概念很新。 Am I doing something wrong?
难道我做错了什么? All help is appreciated.
感谢所有帮助。 Thanks.
谢谢。
Is this perhaps resembling what you're looking for?这可能类似于您正在寻找的东西吗?
>>> import re
>>> line = 'not\nhere\nnode:milk1-01|name=milk1-01\nsomething\n'
>>> re.findall(r'node:.*\|name=.*', line)
['node:milk1-01|name=milk1-01']
You are close, Regexes can contain plain text too.你很接近,正则表达式也可以包含纯文本。 so there is no need to concatenate the strings the way you do, Furthermore you seem to separate letters and digits in your try.
因此无需像您那样连接字符串,此外,您似乎在尝试中将字母和数字分开。 but the blueprint you provide does not make clear if that is actually necessary, Lastly you don't actually capture any part of your match.
但是您提供的蓝图并不清楚这是否真的有必要,最后您实际上并没有捕获比赛的任何部分。 you only check if it's there.
你只检查它是否在那里。
import re
line = "node:milk1-01|name=milk1-01"
my_regex = re.compile('node:(.+)\|name=(.+)')
matches = re.findall(my_regex, line)
print(matches)
>>> [('milk1-01', 'milk1-01')]
A few things to note:有几点需要注意:
(...)
: the parentheses are a capturing group. (...)
:括号是一个捕获组。 There are two sets, to capture two different parts.有两组,捕捉两个不同的部分。
.+
: The .
.+
: .
matches any character;匹配任何字符; so letters numbers hyphens and other (readable) characters.
所以字母数字连字符和其他(可读)字符。 the
+
means to capture one or more of 'them', being the previous character(s) in your regex. +
表示捕获一个或多个“它们”,即正则表达式中的前一个字符。 but you already got that.但你已经明白了。
Final pro-tip: Use a service like Regex101 to build and troubleshoot your regexes.最后的专业提示:使用Regex101 之类的服务来构建正则表达式并对其进行故障排除。 You can see what happens live on-screen.
您可以在屏幕上实时看到发生了什么。
Use采用
re.findall(r'node:[^|]*\|name=[^|]*', line)
EXPLANATION解释
EXPLANATION
--------------------------------------------------------------------------------
node: 'node:'
--------------------------------------------------------------------------------
[^|]* any character except: '|' (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
\| '|'
--------------------------------------------------------------------------------
name= 'name='
--------------------------------------------------------------------------------
[^|]* any character except: '|' (0 or more times
(matching the most amount possible))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.