简体   繁体   English

正则表达式:匹配Python关键字

[英]Regex: Match a Python keyword

I'm trying to make a syntax highlighter for Python using regular expressions (in Python). 我正在尝试使用正则表达式(在Python中)为Python创建语法荧光笔。 Among other things, I want to highlight keywords such as for, while, if etc. To do this I need a regex which matches them. 除其他外,我想突出显示诸如for, while, if等关键字。为此,我需要一个与它们匹配的正则表达式。

My issue is that I don't want, for instance, for to be matched when it is inside a string, only when isolated (whitespace before and after). 我的问题是,我不想要的,例如, for当它是一个字符串中,隔离(前后空格)时才进行匹配。

I had \\bfor\\b at first, which matches every occurrence of a separated for . 一开始我有\\bfor\\b ,它匹配for每次出现。 The issue with this is that it includes things like "string with for inside" 这样做的问题是它包含诸如"string with for inside"

I have thought about look-behind/ahead (as this question suggests), but can't get around that this requires fixed width patterns in Python. 我已经考虑过向前/向后看(正如这个问题所暗示的那样),但是无法避免这需要在Python中使用固定宽度的模式。 Would love to get some guiding tips on things to try here. 希望获得一些尝试的指导技巧。

In short: What could be a regex matching keywords such as for only when interpreted by Python as such. 简而言之:这可能是一个正则表达式匹配关键字,如for只有当通过Python作为这样的解释。

As others have mentioned, there are probably better suited tools for the job. 正如其他人提到的那样,可能有更适合此工作的工具。 That being said, it's always fun to put regex's to new uses, and combined with a little bit of code it should be possible, just not with a single regex. 话虽这么说,将正则表达式用于新用途总是很有趣,并且结合一点点代码应该有可能,而不仅仅是使用单个正则表达式。

Now, there's not an easy way to exclude strings (regex's in general don't handle pairing delimiters nicely), so it would be simplest to create a copy of the text with any strings replaced with spaces so indexing is the same. 现在,没有一种简单的方法来排除字符串(正则表达式通常不能很好地处理配对定界符),因此用所有用空格替换任何字符串的文本创建副本都是最简单的,因此索引是相同的。 Something like \\"[^"]*\\" to find all strings (well, double quoted strings), then replace each match with a string of the same length. Then run your regex to find keywords on the modified string. \\"[^"]*\\"可以找到所有字符串(用双引号引起来的字符串),然后用相同长度的字符串替换每个匹配项,然后运行regex在修改后的字符串上找到关键字。

Adding in cases for single quotes and comments would be (\\"[^"]*\\"|'[^']*'|#.*$) . Of course, this will break if the strings contain any escaped quotes, so you can look for fixes to that, eg this question . 如果要在单引号和注释中加上大小写,将是(\\"[^"]*\\"|'[^']*'|#.*$) 。当然,如果字符串包含任何转义的引号,这将中断。您可以寻找解决方法,例如, 这个问题

To match anything ('for' in the example) with spaces before, and at least one space after: 要将任何内容(示例中为“ for”)与之前的空格匹配,并在之后的空格至少匹配:

'^\s*for\s'

'^' is start of line, '\\s' in any type of space (tab etc.) and '*' to get 0 or more matches. '^'是行的开头,'\\ s'是任何类型的空格(制表符等),'*'是0或多个匹配项。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM