[英]Regex: Match a Python keyword
I'm trying to make a syntax highlighter for Python using regular expressions (in Python). 我正在尝试使用正则表达式(在Python中)为Python创建语法荧光笔。 Among other things, I want to highlight keywords such as for, while, if
etc. To do this I need a regex which matches them. 除其他外,我想突出显示诸如for, while, if
等关键字。为此,我需要一个与它们匹配的正则表达式。
My issue is that I don't want, for instance, for
to be matched when it is inside a string, only when isolated (whitespace before and after). 我的问题是,我不想要的,例如, for
当它是一个字符串中,隔离(前后空格)时才进行匹配。
I had \\bfor\\b
at first, which matches every occurrence of a separated for
. 一开始我有\\bfor\\b
,它匹配for
每次出现。 The issue with this is that it includes things like "string with for inside"
这样做的问题是它包含诸如"string with for inside"
I have thought about look-behind/ahead (as this question suggests), but can't get around that this requires fixed width patterns in Python. 我已经考虑过向前/向后看(正如这个问题所暗示的那样),但是无法避免这需要在Python中使用固定宽度的模式。 Would love to get some guiding tips on things to try here. 希望获得一些尝试的指导技巧。
In short: What could be a regex matching keywords such as for
only when interpreted by Python as such. 简而言之:这可能是一个正则表达式匹配关键字,如for
只有当通过Python作为这样的解释。
As others have mentioned, there are probably better suited tools for the job. 正如其他人提到的那样,可能有更适合此工作的工具。 That being said, it's always fun to put regex's to new uses, and combined with a little bit of code it should be possible, just not with a single regex. 话虽这么说,将正则表达式用于新用途总是很有趣,并且结合一点点代码应该有可能,而不仅仅是使用单个正则表达式。
Now, there's not an easy way to exclude strings (regex's in general don't handle pairing delimiters nicely), so it would be simplest to create a copy of the text with any strings replaced with spaces so indexing is the same. 现在,没有一种简单的方法来排除字符串(正则表达式通常不能很好地处理配对定界符),因此用所有用空格替换任何字符串的文本创建副本都是最简单的,因此索引是相同的。 Something like \\"[^"]*\\"
to find all strings (well, double quoted strings), then replace each match with a string of the same length. Then run your regex to find keywords on the modified string. 像\\"[^"]*\\"
可以找到所有字符串(用双引号引起来的字符串),然后用相同长度的字符串替换每个匹配项,然后运行regex在修改后的字符串上找到关键字。
Adding in cases for single quotes and comments would be (\\"[^"]*\\"|'[^']*'|#.*$)
. Of course, this will break if the strings contain any escaped quotes, so you can look for fixes to that, eg this question . 如果要在单引号和注释中加上大小写,将是(\\"[^"]*\\"|'[^']*'|#.*$)
。当然,如果字符串包含任何转义的引号,这将中断。您可以寻找解决方法,例如, 这个问题 。
To match anything ('for' in the example) with spaces before, and at least one space after: 要将任何内容(示例中为“ for”)与之前的空格匹配,并在之后的空格至少匹配:
'^\s*for\s'
'^' is start of line, '\\s' in any type of space (tab etc.) and '*' to get 0 or more matches. '^'是行的开头,'\\ s'是任何类型的空格(制表符等),'*'是0或多个匹配项。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.