[英]python - Match Everything except the string regex
Data Set数据集
Cider
631
Spruce
871
Honda
18813
Nissan
3292
Pine
10621
Walnut
10301
Code代码
#!/usr/bin/python
import re
text = "Cider\n631\n\nSpruce\n871Honda\n18813\n\nNissan\n3292\n\nPine\n10621\n\nWalnut\n10301\n\n"
f1 = re.findall(r"(Cider|Pine)\n(.*)",text)
print(f1)
Current Result当前结果
[('Cider', '631'), ('Pine', '10621')]
Question:题:
How do I change the regex from matching everything except several specified strings?如何更改正则表达式以匹配除几个指定字符串之外的所有内容? ex (Honda|Nissan)
前(本田|日产)
Desired Result想要的结果
[('Cider', '631'), ('Spruce', '871'), ('Pine', '10621'), ('Walnut', '10301')]
You can exclude matching either of the names or only digits, and then match the 2 lines starting with at least a non whitespace char.您可以排除匹配名称或仅匹配数字,然后匹配至少以非空白字符开头的 2 行。
^(?!(?:Honda|Nissan|\d+)$)(\S.*)\n(.*)
The pattern matches:模式匹配:
^
Start of string ^
字符串开始(?!
Negative lookahead, assert not directly to the right (?!
负前瞻,断言不是直接向右
(?:Honda|Nissan|\\d+)$
Match any of the alternatives at followed by asserting the end of the string (?:Honda|Nissan|\\d+)$
匹配 at 的任何选项,然后断言字符串的结尾)
Close lookahead )
关闭前瞻(\\S.*)
Capture group 1 , match a non whitespace char followed by the rest of the line (\\S.*)
捕获组 1 ,匹配一个非空白字符,然后是该行的其余部分\\n
Match a newline \\n
匹配换行符(.*)
Capture group 2 , match any character except a newline (.*)
捕获组 2 ,匹配除换行符以外的任何字符import re
text = ("Cider\n"
"631\n\n"
"Spruce\n"
"871\n\n"
"Honda\n"
"18813\n\n"
"Nissan\n"
"3292\n\n"
"Pine\n"
"10621\n\n"
"Walnut\n"
"10301")
f1 = re.findall(r"^(?!(?:Honda|Nissan|\d+)$)(\S.*)\n(.*)", text, re.MULTILINE)
print(f1)
Output输出
[('Cider', '631'), ('Spruce', '871'), ('Pine', '10621'), ('Walnut', '10301')]
If the line should start with an uppercase char AZ and the next line should consist of only digits:如果该行应以大写字符 AZ 开头并且下一行应仅包含数字:
^(?!Honda|Nissan)([A-Z].*)\n(\d+)$
This pattern matches:此模式匹配:
^
Start of string ^
字符串开始(?!Honda|Nissan)
Negative lookahead, assert not Honda or Nissan directly to the right (?!Honda|Nissan)
负前瞻,直接向右断言不是本田或日产([AZ].*)
Capture group 1 , match an uppercase char AZ followed by the rest of the line ([AZ].*)
捕获组 1 ,匹配大写字符 AZ 后跟该行的其余部分\\n
Match a newline \\n
匹配换行符(\\d+)
Capture group 2 , match 1+ digits (\\d+)
捕获第 2 组,匹配 1+ 个数字$
End of string $
字符串结尾inverse it with caret '^' symbol.用插入符号 '^' 符号反转它。
f1 = re.findall(r"(\s?^(Cider|Pine))\n(.*)",text)
Keep in mind that caret symbol (in regex) has a special meaning if it is used as a first character match which then would alternatively mean to be “does it start at the beginning of a line”.请记住,插入符号(在正则表达式中)如果用作第一个字符匹配则具有特殊含义,这意味着“它是否从一行的开头开始”。
Thats why one would insert a “non-usable character” in the beginning.这就是为什么人们会在开头插入一个“不可使用的字符”。 I chosed an optional single space to use up that first character thereby rendering the meaning of the caret (^) symbol as NOT to mean “the beginning of the line”, but to get the desired inverse operator.
我选择了一个可选的单个空格来用完第一个字符,从而将插入符号 (^) 的含义渲染为不是“行的开头”,而是为了获得所需的逆运算符。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.