简体   繁体   English

python中的正则表达式在CSS中获取javadoc样式的注释

[英]Regex in python to get javadoc-style comments in CSS

I'm writing a python script to loop through a directory of CSS files and save the contents of any which contain a specifically-formatted javadoc style comment. 我正在编写一个Python脚本来循环浏览CSS文件目录,并保存其中包含特定格式的javadoc样式注释的任何内容。

The comment/CSS looks like this: 注释/ CSS看起来像这样:

/**thirdpartycss

* @description Used for fixing stuff

*/
.class_one {
    margin: 10px;
}
#id_two {
    padding: 2px;
}

The regex to fetch the entire contents of the file looks like this: 用于获取文件全部内容的正则表达式如下所示:

pattern = "/\*\*thirdpartycss(.*?)}$"
matches = re.findall(pattern, css, flags=re.MULTILINE | re.DOTALL)

This gives me the file contents. 这给了我文件内容。 What I want to do now is write a regex to grab each CSS definition within the class. 我现在想做的是编写一个正则表达式来获取类中的每个CSS定义。 This is what I tried: 这是我尝试的:

rule_pattern = "(.*){(.*)}?"
rules = re.findall(rule_pattern, matches[0], flags=re.MULTILINE | re.DOTALL)

I'm basically trying to find any text, then an opening {, any text, then a closing } - I want a list of all of the CSS classes, essentially, but this just returns the entire string in one chunk. 我基本上是在尝试找到任何文本,然后是开头{,任何文本,然后是结尾}-本质上,我想要所有CSS类的列表,但这只是将整个字符串归为一个块。

Can anybody point me in the right direction? 有人能指出我正确的方向吗?

Thanks. 谢谢。 Matt 马特

{(.*)} is a greedy match -- it will match from the first { to the last } , thus gobble up any { / } pairs that might be inside those. {(.*)}是一个贪婪的匹配-它将从第一个{到最后一个}匹配,因此吞噬了其中可能存在的所有{ / }对。 You want non-greedy matching, that is 您需要非贪心匹配,即

{(.*?)}

the difference is the question mark after the asterisk, making it non-greedy. 区别在于星号后的问号,使其变得不贪心。

This still won't work if you need to properly match "nested" braces -- but then, nothing in the RE world will: among regular languages many well-known limitations (regular languages are those that regular expressions can match) is that "properly nesting" any kind of open/closed parentheses is impossible (some incredibly-extended so-called-RE manage to, but not Python's, and anybody with CS background will find calling those expression "regular" offensive anyway;-). 如果您需要正确匹配“嵌套”花括号,那么这仍然行不通-但是,在RE世界中, 什么也不会:在正则语言中,许多众所周知的限制(正则语言可以匹配正则表达式)是“适当地嵌套”任何类型的开/闭括号是不可能的(某些难以置信地扩展的所谓的RE设法做到了,但是Python却没有,而且具有CS背景的人都会发现称呼这些表达为“常规”冒犯性;-)。 If you need more general parsing than REs can afford, pyparsing or other full-fledged Python parsers are the right way to go. 如果您需要的常规解析超出RE所能承受的范围, 那么pyparsing或其他成熟的Python解析器是正确的选择。

@Alex is right (is he ever not? but I digress). @Alex 是对的 (他不是吗?但我离题了)。 You are better off using a custom parser if you need more specific parsing than what regular expressions can offer. 如果您需要比正则表达式可以提供的更具体的解析,则最好使用自定义解析器。 Luckily you don't have to reinvent the (CSS parsing) wheel. 幸运的是,您不必重新发明(CSS解析)轮。 There is an already existing solution for this. 对此已经存在解决方案。

I faced a similar requirement some time back. 不久前,我也面临类似的要求。 The cssutils module came in handy at the time. 当时, cssutils模块派上了用场。 I just refreshed my cssutils fu to cook up this code snippet for you: 我刚刚刷新了cssutils fu来为您准备以下代码片段:

In [16]: import cssutils

In [17]: s = """/**thirdpartycss
* @description Used for fixing stuff
*/
.class_one {
    margin: 10px;
}
#id_two {
    padding: 2px;
}"""

In [26]: sheet = cssutils.parseString(s)

In [27]: sheet.cssRules
Out[27]: 
[cssutils.css.CSSComment(cssText=u'/**thirdpartycss\n* @description Used for fixing stuff\n*/'),
 cssutils.css.CSSStyleRule(selectorText=u'.class_one', style=u'margin: 10px'),
 cssutils.css.CSSStyleRule(selectorText=u'#id_two', style=u'padding: 2px')]

In [28]: sheet.cssRules[0].cssText
Out[28]: u'/**thirdpartycss\n* @description Used for fixing stuff\n*/'

In [29]: print sheet.cssRules[0].cssText
-------> print(sheet.cssRules[0].cssText)
/**thirdpartycss
* @description Used for fixing stuff
*/

You can parse the CSS and then loop through the sheet object's cssRules to find all CSSComment instances. 您可以解析CSS,然后在工作sheet对象的cssRules循环查找所有CSSComment实例。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM