简体   繁体   English

使用正则表达式的缺点

[英]Disadvantages of using Regular Expressions

Recently I was advised by my manager not to depend much on Regex as it has lot of disadvantages.最近,我的经理建议我不要过多依赖 Regex,因为它有很多缺点。 When I tried to learn more, I hear that it has issues like regex can result in memory leak as some objects continue to hang on strings references even after use?当我尝试了解更多信息时,我听说它存在诸如正则表达式可能导致 memory 泄漏的问题,因为某些对象即使在使用后仍继续挂在字符串引用上?

.NET RegEx "Memory Leak" investigation .NET RegEx“内存泄漏”调查

So it it right to say that reg-ex causes memory overheads and should not be used if you have other options?所以说正则表达式会导致 memory 开销并且如果您有其他选择不应该使用它是正确的吗? Is there any other disadvantaged to reg-ex (apart from it being tough to learn:) ) reg-ex 是否还有其他不利条件(除了难以学习:))

PS I am developing an application (c#.net) similar to web crawler which extracts all hrefs and some other information like title, meta tags etc..I have the option of using HTML Agility pack instead of reg-ex. PS我正在开发一个类似于web爬虫的应用程序(c#.net),它提取所有href和一些其他信息,如标题、元标记等。我可以选择使用HTML敏捷包而不是reg-ex。

Makes the code difficult to read.使代码难以阅读。 Most of the time, even at the expense of having more verbose code, you are better off not using regular expressions.大多数时候,即使以拥有更冗长的代码为代价,最好不要使用正则表达式。 The costly performance impact and the degradation in the readability of the code means that you don't use regexes in most of the cases, especially, the simpler ones and the complex ones.代价高昂的性能影响和代码可读性的下降意味着您在大多数情况下不使用正则表达式,尤其是简单的和复杂的。

And for the purpose you are mentioning ( parsing HTML etc. ), regular expressions simple cannot get the job done ( because HTML is not a regular language ).并且出于您提到的目的(解析 HTML 等),简单的正则表达式无法完成工作(因为 HTML 不是正则语言)。 It is is like having a hammer and everything looks like a nail.这就像有一把锤子,一切看起来都像钉子。

Regular expressions can obfuscate the logic you are using;正则表达式可以混淆你正在使用的逻辑; it may be less complex to do it in code sometimes.有时在代码中执行此操作可能不那么复杂。 In code you can break the different logical tests up and comment each one so that people can see why you are doing what you are doing.在代码中,您可以分解不同的逻辑测试并对每个测试进行注释,以便人们可以看到您为什么要这样做。

My view on this is that RegEx can often do the job but you need to be careful that you don't overuse them.我对此的看法是 RegEx 通常可以完成这项工作,但您需要小心不要过度使用它们。 As they say, when all you have is a hammer every problem looks like a nail.正如他们所说,当你只有一把锤子时,每个问题看起来都像钉子。

In this case you are trying to parse HTML to get data out.在这种情况下,您尝试解析 HTML 以获取数据。 An HTML parser will be both more readable and probably more reliable. HTML 解析器将更具可读性并且可能更可靠。 Regular Expressions to parse HTML often will either fail in some circumstances (malformed HTML being the big one) or be way more complicated than if you used an HTML parser.解析 HTML 的正则表达式在某些情况下通常会失败(格式错误的 HTML 是大问题),或者比使用 HTML 解析器要复杂得多。

I don't know about the memory leaks and performance issues but even ignoring that I tend to try to keep regex use to a minimum.我不知道 memory 泄漏和性能问题,但甚至忽略了我倾向于尽量减少正则表达式的使用。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM