简体   繁体   中英

T-SQL search html with regex?

In my database I have a field wich contains a html document. Now there must be a possibility to search in this document. However, the html tags may not be found. So when I have something like this:

<html>
  <head>
    <title>Bar</title>
  </head>
  <body>
   <p>
     this content my be found
   </p>
  </body>
</html>

It is possible that the document stored in the database is not xhtml. Can you tell me what the best way is to search in the content? Shall i use regular expressions? And of so, how would it look like? ANd if not, what should I use else?

您可以尝试启用全文搜索或使用Lucene.Net之类的内容为您索引内容。

What volume of records are there? I expect you might have to use full-text search and an IFilter to do this efficiently. Html does not lend itself well to regex - it can quickly be very hard to do something very simple.

If the volume isn't huge, can you iterate over the records with an external parsing application, using something like the HTML Agility Pack (for .NET) - or any other DOM of your choice.

But the FTS/IFilter would be my first choice.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM