简体   繁体   中英

text extraction from web pages

I'm working on a program that reads the content of this page: http://www.pogdesign.co.uk/cat/ and alerts me when one of my favourite tv series is scheduled and on which day of the month. In this program I also would like to have a JFrame that display all months (maybe a JTabbedPane ) and in each month I want to list all episodes of my favourite tv series with the relative day of the month.

I have already wrote something using " jsoup: Java HTML Parser " in order to extract text from a html web page. I need to understand what approach to use in order to do implements these steps:

  1. find the day of a month when the episodes of a specific tv series are scheduled and save them somewhere.
  2. get a ref links of those episodes and use them to find the broadcast time of each episode and save them somewhere.

So what do you think is a good strategy to do something like that? Is the Java HTML Parser enough to complete a program like this?

First I suggest you to get a list of item that have the title you want to find, and after use the parent() method (JSOUP) to know the relative day all in one step.

I think you can easy do it with Jsoup.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM