简体   繁体   中英

Rails, HTML to JSON?

Given a static HTML page, is there an automated way to generate json?

For a large website that contains a lot of static HTML I am wanting to generate json for RSS feeds and search functionality and am looking for a way to convert HTML to json.

I could obviously write json templates for every page and every language but that would be a unmaintainable. That would double an 800page website to 1600 pages and that is not an option.

One approach I thought of could be to write a bot that would loop through the routes to index the pages and save data to a database which would give me all the choices I could wish for, for searching such as solr, elastic search, thinking sphinx etc...

I could use capybarra to aid me in this by visiting each path and extracting text to save to a database in a rake task as a background job but not sure how that would work in a production environment and it seems that such a common requirement might have already been achieved but for the life of me I can't find one.

I would be far happier (I think) if I could find a way to convert HTML text content to JSON

Any ideas? Has this already been done? are there any gems that might help? or is there built in functionality that I have not thought of, maybe a way to get html into a hash that could then be converted into json? whatever the approach it needs to be automated. I'm just stuck for the best approach.

Basically html looks a lot like xml, but with strong tag meanings, so you could use xml to json conversion, if it all ends up getting tree of html tags embedded in each other. And so your question becomes this question Except you might get problems with single tags, without closing one. So you might get all of these and put a closing bracket after each one before trying to get it as hash from xml. Oh, early answer. Btw in general for parsing text data you should look at regular expressions.

我最终选择了nokogiri解决方案,并编写了一个解析器来满足我的需求

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM