简体   繁体   中英

Which language is best for this editoral and op-ed aggregator project?

I'm looking for an aggregator for the editoral and op-ed pages of a bunch of English language newspapers I want to follow. The objective is to generate an HTML that is just a collection of editorial pieces from the dozen newspapers I want to follow internationally, so that I can print them off in the morning. Since this is a very narrow requirement, I couldn't find anything already available so I'm thinking of writing one on my own.

Now, I used to be a programmer for ~8 years in my previous life (and now have been swayed to the "Dark Side" that is Wall Street after my MBA). I'm not knowledgeable enough today about programming to make a good choice on a scripting language so am unsure which the best language for this would be (performance is not a key issue, libraries for parsing HTML, text handling as well as getting data off live web pages are more important).

PS: I don't mind learning a new language (previously I worked extensively with x86 ASM, C and Visual C++/MFC) almost exclusively in Win32 environments.

Use Python and the excellent lxml library for scraping HTML. It supports CSS selectors, which is a huge convenience, and it's rather fast. It handles broken HTML well too.

解释型语言在代码生成方面做得很好,您应该考虑Perl或Ruby

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM