简体繁体中英

text extraction from web pages

原文 2013-07-15 15:30:19 0 1 java/ url/ html-parsing

I'm working on a program that reads the content of this page: http://www.pogdesign.co.uk/cat/ and alerts me when one of my favourite tv series is scheduled and on which day of the month. In this program I also would like to have a JFrame that display all months (maybe a JTabbedPane ) and in each month I want to list all episodes of my favourite tv series with the relative day of the month.

I have already wrote something using " jsoup: Java HTML Parser " in order to extract text from a html web page. I need to understand what approach to use in order to do implements these steps:

find the day of a month when the episodes of a specific tv series are scheduled and save them somewhere.
get a ref links of those episodes and use them to find the broadcast time of each episode and save them somewhere.

So what do you think is a good strategy to do something like that? Is the Java HTML Parser enough to complete a program like this?

1 answers

First I suggest you to get a list of item that have the title you want to find, and after use the parent() method (JSOUP) to know the relative day all in one step.

I think you can easy do it with Jsoup.

Generic Article Extraction from web pages

JSoup core web text extraction

XPATH based content extraction from html pages

Text Extraction from HTML Java

Agreement feature extraction from a text

Extraction of text from Image in Jmeter

How to get text extraction from PDF to work?

java - omitting special characters from text extraction

Java - Text Extraction from PDF using OCR

Text Extraction from an Image Using java

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Generic Article Extraction from web pages JSoup core web text extraction XPATH based content extraction from html pages Text Extraction from HTML Java Agreement feature extraction from a text Extraction of text from Image in Jmeter How to get text extraction from PDF to work? java - omitting special characters from text extraction Java - Text Extraction from PDF using OCR Text Extraction from an Image Using java

Related Tags

text extraction from web pages

Question

1 answers

solution1 0 ACCPTED 2013-07-15 15:40:23

solution1
0 ACCPTED 2013-07-15 15:40:23