简体   繁体   中英

Ignore HTML tags and search text only in SQLite

I am making an android app that displays stored HTML data using webview. Now, the problem I am trying to over come is how to ignore HTML/CSS etc tag/elements when searching for some user-input string. My DB is already 110MB and I think using another field with only text and no HTML will just add more size to DB. Regex will be expensive too and may not be reliable.

Is there any other way to do it?

Maybe you can do an additional filtering in your program on the queried records. You can use HTML parsers like Jsoup to strip HTML tags, then you can search in the remaining text. Simple Java example with Jsoup:

List<String> records = ... // your queried records - potential results
List<String> results = new ArrayList<String>();
for(String r : records) {
    Document d = Jsoup.parse(r); // parse HTML
    String text = d.text(); // extract text
    if (text.contains(searchTerm)) { // or do your search here
        results.add(r);
    }
}
return results; // you got real results here

It may not be the best solution but is an option. I think it's expensive too, but more reliable than regular expressions (which you try to avoid).

Update: the regex way

I think the only way to strip HTML tags while fetching is to use regex in SQLite . For example, the following pattern should work to match string outside HTML tags:

(^|>)[^<]*(searchterm)[^<]*(<|$)

In the following example text it will match only the 1st, 3rd and 4th searchterm and not the 2nd:

searchterm <tag searchterm> searchterm </tag> searchterm

You can see it in action here .

In SQLite you can use regular expressions this way:

WHERE column-name REGEXP 'regular-expression'

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM