简体   繁体   中英

How to most efficiently extract content from this html string with javascript? (Highest Peformance = Lowest Milliseconds)

I have the following HTML string in variable "myhtml":

<html><head><title>hackaday</title></head><body>
<span background-color="#0000">Welcome to the world.</span><div>You want a little treat...tomatoes berries walnutsDont You? <a href="http://getyourtreat.com">Get Your Treat</a> You will enjoy it. Eat It. Love it.</div></body></html>

What I want to extract from this html string is "tomatoes berries walnuts". Note that every time I refresh the HTML page, there may be different words that show up instead of "tomatoes berries walnuts" like "chocolate chips soda".

What is the absolute fastest way to extract the string I am looking for? My current solution is to use a split on the "..." to get everything after, then use another split on the word "Dont" since nothing on that page/html changes except for those specific three words.

Is there a smarter/faster solution?

In theory, using a sliding window would be the fastest possible solution as it would take one pass and is O(n). However, also in theory all O(n) are equivalent and as a result using 3 passes is just as fast.

Use large segments in your index to ensure accuracy.

 var htmlString = "<html><head><title>hackaday</title></head><body><span background-color=\\"#0000\\">Welcome to the world.</span><div>You want a little treat...tomatoes berries walnutsDont You? <a href=\\"http://getyourtreat.com\\">Get Your Treat</a> You will enjoy it. Eat It. Love it.</div></body></html>"; var start = "<div>You want a little treat..."; var end = "Dont You? <a href=\\"http://getyourtreat.com"; var startIndex = htmlString.indexOf(start);//pass one var endIndex = htmlString.indexOf(end);//pass two var result = htmlString.substring(startIndex+start.length,endIndex);//pass three console.log(result); 

You can use a regex instead:

 var str = '<html><head><title>hackaday</title></head><body><span background-color="#0000">Welcome to the world.</span><div>You want a little treat...tomatoes berries walnutsDont You? <a href="http://getyourtreat.com">Get Your Treat</a> You will enjoy it. Eat It. Love it.</div></body></html>'; var pattern = /\\.{3}([\\w\\s]+)Dont/; console.log(str.match(pattern)[1]); 

With my update to match \\w\\s instead of (.*) my solution is faster (than the substring index method) in Firefox, Chrome and Safari

https://jsperf.com/substring-index-vs-regex

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM