I am making an android app that displays stored HTML data using webview. Now, the problem I am trying to over come is how to ignore HTML/CSS etc tag/elements when searching for some user-input string. My DB is already 110MB and I think using another field with only text and no HTML will just add more size to DB. Regex will be expensive too and may not be reliable.
Is there any other way to do it?
Maybe you can do an additional filtering in your program on the queried records. You can use HTML parsers like Jsoup to strip HTML tags, then you can search in the remaining text. Simple Java example with Jsoup:
List<String> records = ... // your queried records - potential results
List<String> results = new ArrayList<String>();
for(String r : records) {
Document d = Jsoup.parse(r); // parse HTML
String text = d.text(); // extract text
if (text.contains(searchTerm)) { // or do your search here
results.add(r);
}
}
return results; // you got real results here
It may not be the best solution but is an option. I think it's expensive too, but more reliable than regular expressions (which you try to avoid).
Update: the regex way
I think the only way to strip HTML tags while fetching is to use regex in SQLite . For example, the following pattern should work to match string outside HTML tags:
(^|>)[^<]*(searchterm)[^<]*(<|$)
In the following example text it will match only the 1st, 3rd and 4th searchterm
and not the 2nd:
searchterm <tag searchterm> searchterm </tag> searchterm
You can see it in action here .
In SQLite you can use regular expressions this way:
WHERE column-name REGEXP 'regular-expression'
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.