I'm trying to crawl data from a website with a list of item that belong in a div tag. Then in that single item, two separate part is made also with div tag. One with image, and one with text and description. In startElement, I can identify them with Attribute but I can't end in endElement. How can I parse item with same tag?
Example of an item I want to crawl:
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>JSP Page</title>
</head>
<body>
<div class="o-ResultCard__m-MediaBlock m-MediaBlock">
<div class="m-MediaBlock__m-TextWrap">
<h3 class="m-MediaBlock__a-Headline">
<a href="abc.com"><span class="m-MediaBlock__a-HeadlineText">Air Fryer Chicken Wings</span></a>
</h3>
<div class="parbase recipeInfo time">
<section class="o-RecipeInfo__o-Time">
<dl>
<dt class="o-RecipeInfo__a-Headline a-Headline">Total Time: 40 minutes</dt>
</dl>
</section>
</div>
</div>
<div class="m-MediaBlock__m-MediaWrap">
<a href="abc.com" class="" title="Air Fryer Chicken Wings">
<img src="https://dinnerthendessert.com/wp-content/uploads/2019/01/Fried-Chicken-2.jpg" class="m-MediaBlock__a-Image" alt="Air Fryer Chicken Wings">
</a>
</div>
</div>
</body>
My handler:
private String currentTag;
private FoodDAO dao;
private FoodsDTO dto;
private String itemIdentify = "o-ResultCard__m-MediaBlock m-MediaBlock";
private String itemMedia = "m-MediaBlock__m-MediaWrap";
private String itemText = "m-MediaBlock__m-TextWrap";
private boolean foundItem;
public FoodHandler() {
dao = new FoodDAO();
foundItem = false;
}
@Override
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
String attrVal = attributes.getValue(0);
if (qName.equals("div") && attrVal.equals(itemIdentify)) {
dto = new FoodsDTO();
foundItem = true;
}
currentTag = qName;
}
@Override
public void endElement(String uri, String localName, String qName) throws SAXException {
if (qName.endsWith("div")) {
foundItem = false;
try {
dao.manageCrawl(dto);
} catch (Exception e) {
Logger.getLogger(NewsHandler.class.getName()).log(Level.SEVERE, null, e);
}
}
currentTag = "";
}
Stop the attributes in a stack.
More specifically, store a copy of the attributes in a Deque
:
private Deque<Attributes> attributesStack = new ArrayDeque<>();
@Override
public void startDocument() throws SAXException {
// Clear the stack at start of parsing, in case this handler is
// re-used for multiple parsing operations, and previous parse failed.
attributesStack.clear();
}
@Override
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
attributesStack.push(new AttributesImpl(attributes)); // Attributes must be copied
// code here
}
@Override
public void endElement(String uri, String localName, String qName) throws SAXException {
Attributes attributes = attributesStack.pop();
// code here
}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.