简体   繁体   English

获取页面标题的好策略是什么?

[英]What is a good strategy for getting the title of a page?

... server side and using PHP. ...服务器端并使用PHP。

I read this SO article on when to use regexes and it basically states that you can use regexes to parse HTML in certain cases. 我读了这篇关于何时使用正则表达式的文章 ,它基本上表明在某些情况下可以使用正则表达式来解析HTML。

<title></title>

should be easy to match. 应该很容易搭配。

I see no problem with this. 我认为没有问题。 I think the popular answer is voted so much for not bc of correctness but bc of entrainment value. 我认为普遍的答案不是因为正确性BC,而是因为夹带价值BC。

Is this OK? 这个可以吗?

Yes, it is 是的

/<title[^>]*>(.*?)<\/title>/is

Different people have different opinions, though. 但是,不同的人有不同的意见。 And you should only use regex if you know what you're doing. 而且,只有知道自己在做什么,才应使用正则表达式。
This might me a very interesting read: When you should NOT use Regular Expressions? 这可能是我很有趣的读物: 什么时候不应该使用正则表达式?

Your best bet is to use an HTML parsing library (like this one ), not regex. 最好的办法是使用HTML解析库(如这一个 ),而不是正则表达式。 You may get away with using regex in this case, but it's like using a hammer to pound in a screw. 在这种情况下,您可能不使用正则表达式,但这就像用锤子敲打螺丝一样。

If you are looking for anything non-trivial in the HTML, regex is going to be very confusing and hard to read, and in many cases, regex cannot do the job without making many assumptions about the content of the HTML. 如果您正在寻找HTML中不平凡的东西,则regex将会非常混乱且难以阅读,并且在许多情况下,如果不对HTML的内容进行许多假设,则regex无法完成这项工作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM